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SEX DIFFERENCES IN TYPE OF EDUCATIONAL 
MASTERY 


FREDERICK H. LUND 


Temple University 


For the most part psychologists have viewed with distrust ideas 
setting forth special differences between the sexes. They have been 
wary of these ideas because of the popular and traditional tendency 
toward exaggeration in this field. The intense emotional interest 
surrounding the subject of sex may account for this disposition toward 
exaggeration. Wherever such interest exists we may expect to see 
dramatization in the direction of emotional preference. Thus, ana- 
tomical differences being at the root of sex attraction, there has been a 
tendency to extend these differences and to assume that the sexes vary, 
not only in physical and anatomical traits, but in psychic, mental, and 
spiritual traits as well. Aware of these proclivities, it is not strange 
that the psychologist should have been on his guard whenever such 
concepts have been advanced. 

It was not with the idea of discovering new or mysterious sex 
differences that the present study was undertaken. In fact, its original 
purpose had nothing to do with sex differences, being concerned solely 
with the relation between high school and college grades, and between 
school marks and actual mastery of subject-matter as measured by 
objective tests. That interest should finally have centered about sex 
differences is only because, with the assembling of the data, these 
differences turned out to be their most significant feature. 

No attempt is made in the present study to determine whether the 
differences under consideration are native or acquired. For one thing 
this distinction is, at best, a difficult one to make. Then again it is 
differences as we find them under normal surroundings which must be 
of chief concern. 
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SUMMARY OF EARLIER FINDINGS 


Interest in sex differences has centered about four major problems: 
Differences (1) in general intelligence, (2) in variability, (3) in scholar- 
ship, and (4) in special mental capacities. 

Regarding the first of these—general intelligence—Thorndike,' 
Pressey,? Lincoln,* Burnham,‘ Wreschner,® and other investigators, 
are agreed that differences, if they exist, are not great enough to be 
important. That girls between the ages of nine and fifteen should tend 
on the average to surpass boys may be due entirely to the fact that they 
mature earlier than boys. 

The hypothesis that boys are more variable than girls has been 
under investigation particularly by Leta 8S. Hollingvorth,® who finds 
no significant difference in this respect. She also presents data to 
show that what appears to be greater frequency of amentia as well 
as greater frequency of retardation in school work in the case of boys, 
may best be attributed to different standards set up for boys and girls, 
and to the greater likelihood that native limitations in girls out of 
school will go undetected. Confirmation for Hollingworth’s position 
is found in the work of Pyle,’ Cornell,* and Henmon.® 

Pronounced differences in scholarship, on the other hand, are 
reported by Book,’* Lehman,'! Patterson,'? and others. The findings 
of these would indicate that girls are, in general, markedly superior in 
scholarship. 

However, this superiority is not equal in all subjects. It is most 
apparent in the language courses and in English and literature. The 
boys make their best showing in the sciences and in mathematics. 
But even in these they do not surpass the girls so far as grades are 
concerned. 


THE PRESENT PROBLEM 


Lincoln,’? in his summary of ‘‘Sex Differences in School Achieve- 
ment,” states that girls are definitely superior in all school subjects. 
What we expect to show is that the correctness of this statement 
depends upon the measure used in determining educational superiority 
—whether we use grades or school marks, objective test records, or 
long-range educational tests. It is only in the case of the first of these 
that the superiority of the girls is really definite and pronounced. 
On the objective tests the difference is greatly reduced, and on the 
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long-range tests (educational tests administered after a considerable 
lapse of time) the position of the boys and the girls is reversed. 

Statistical and educational test records for three hundred thirty- 
eight students covering a period of five years constitute the raw 
material upon which the present analysis is based. The students 
represented in this study were freshmen at Bucknell University during 
the school-year 1929-1930. In the spring of that year the very compre- 
hensive educational tests of the Carnegie Foundation were admin- 
istered. These tests covered the subject-matter of the entire high 
school curriculum, and might, because of their relation to the material 
learned in high school, be called “long-range tests,” as distinguished 
from the short-range tests,”’ or those given in connection with the 
courses. 

In addition to the Carnegie tests the students under consideration 
were given the American Council on Education Intelligence Test, the 
Iowa Mathematics Placement Test, the Bucknell English Placement 
Test, and the language tests of the Columbia Research Bureau. The 











TaBLeE I 
Q: Q: Q; Q 
Test 

M\|F\|M\F \|M|\FiMi\F 
SS DEY cre aR 25 | 27 | 23 | 30 | 24 | 25 | 28 | 18 
Carnegie minus English tests.......... 27 | 21 | 23 | 29 | 23 | 28 | 27 | 22 
PIE oSScacclncccostsuceweuns 24 | 27 | 27 | 21 | 26 | 27 | 23 | 25 
MING 5h 564 ca cv nce awwe re eeess’ 22 | 33 | 22 | 30 | 27 | 25 | 29 14 
High school grades.................... 14 | 48 | 23 | 30 | 29 | 16 | 34| 6 
CR ins vnc 460006 wren 16 | 45 | 23 | 30 | 29 | 18 | 32| 7 
SS hs. 6 0:00 6000 cbawadenna 16 | 47 | 23 | 27 | 29 | 19! 32 | 7 
English placement.................... 16 | 46 | 24 | 28 | 28 17 | 32) 9 
Carnegie english tests................. 17 | 41 | 22 | 32 | 30 | 14) 31 | 13 





























Distribution of the unselected group (freshmen in attendance at Bucknell Uni- 
versity 1929-1930) in various test performances. The figures appearing in each 
quartile indicate the percentage of boys (M) and the percentage of girls (F) falling 
within a given quartile. 


results of all these tests were compared with the academic records for 
the four years in high school. 

In order to make the desired comparison between the relative 
standing of the boys and the girls in the various tests, it was necessary 
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to secure an order of merit distribution for each test. In securing 
this distribution no attention was paid to sex. Having marked off 
the quartiles in these distributions the number of boys and the number 


TaB_e II 



































Q: Q: Q: Qs 
Test 

M\F\|M\F\|M\|FiM\|F 
ES EOL OIE SLE IPO LONI LEE 34 | 17 | 23 | 26 | 22 | 28 | 21 | 29 
Carnegie minus English tests.......... 35 | 16 | 27 | 22 | 21 | 30 17 | 32 
NY 5s) y'g wae p04-Aa Kh Ge Caen 32 | 19 | 26 | 22 | 25 | 25 | 17 | 34 
DU Ca wices se acaccehelaee es 28 | 22 | 25 | 26 | 25 | 25 | 22 | 27 
Carnegie English tests................ 22 | 28 | 24 | 26 | 27 | 23 | 27 | 23 
IS oi: 50s 0.5 ah oad Kenan 19 | 31 | 24 | 26 | 23 | 27 | 34 | 16 
Pe errr re eee 18 | 34 | 21 | 28 | 28 | 21 | 33 | 17 








Distribution of the selected group (boys and girls paired for scholastic records) in 
various test performances. The figures appearing in each quartile indicate the per- 
centage of boys (M) and the percentage of girls (F) falling within a given quartile. 


of girls falling within each quartile were then noted. In Tables I and 
II these numbers have been rébjiced to percentages according to the 
numfigr of/of boys and the number girls represented. Thus, the 











: 25% 
c sie 7 1 
arnesie Test Q { 27% sine 
c 247 Girls 
Miathematics  Q1{ 27% 















tnbeitiagenee . We cceeinitemmiammniaies 








Sigh Gehock Os 
Grades 


College Grades Of 











per rr rrr ery tera ery errr ts to ieee 











Language Test Q1{— 








Bnglish Place- Ql{ 
ment 





Fig.l - Graphic presentation of the percentage of boys 
and ned percentage of girls falling within the first 
quartile. 


numbers under M (boys) and the numbers under F (girls) each total 
one hundred. The distributions for the first and the fourth quartiles 
of Table I are shown graphically in Figs. 1 and 2. In the case of the 
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high school and college grades the same procedure was followed as in 
the case of the test records. The average grade attained during the 
four years of high school, and the average grade attained while in col- 
lege, became the basis for the order of merit and quartile distributions. 

In Table I we have represented the results for the unselected group, 
in Table II the results for the selected group. The unselected group 
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Piz. 2 - Graphic presentation of the percentage of boys 
and the percentage of girls falling within the fourth 
percentile. 


represents the entire freshman class, the selected group only those 
boys and girls (about half the class) who could be paired for academic 
records. The reason for this division is that the girls then in attend- 
ance at Bucknell were—as measured by academic records—a much 
more select group. This higher selection is due to the fact that a 
smaller percentage of the girls applying for admission is accepted, 
with the result that the girls must actually have a better high school 
record in order to secure admission. 


ANALYSIS AND INTERPRETATION 


From Tables I and II it will be seen that the girls make their best 
showing in high school and college grades, and in the English and 
language tests, while the boys make their best showing on the Carnegie 
tests and the mathematics and intelligence tests. This difference in 
relative standing is the same for both the selected and the unselected 
groups. It is to the selected group, however, that our attention should 
be chiefly directed, since the boys and girls of this group are equally 
representative, and should, for this reason, provide a better basis for 
comparison. 
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That the girls should do better in the English and language courses, 
and the boys better in the mathematics and science courses, is con- 
sistent with earlier findings. Of greater interest and significance to 
the present analysis is, first, the fact that the girls, according to their 
intelligence rating, make a better showing than the boys in their 
school work, and, second, the fact that the scholastic superiority of 
girls, as measured by school marks, is almost if not entirely eliminated 
when retests for the same material are given after an interval of time. 
We can hardly say, then, that the girls are superior in educational 
achievement, since on the Carnegie retests their superiority is no 
longer apparent. In fact, in our selected group, in which academic 
records have been equated for, the boys are distinctly superior— 
twice as many boys as girls reaching the highest quartile. 

How shall we account for this difference? Why should the girls 
do better on the immediate or short-range tests for classroom material 
and poorer on the long-range memory tests for the same material? 
Is the difference due merely to different standards in grading, the girls 
being graded more leniently? Or shall we look for something more 
fundamental—a difference in interest and motivation, or a difference 
in type of mastery, the mastery of the boys being of such a character 
as to allow for better retention? No doubt the matter of grading, of 
interest and motivation, and of type of educational mastery, are all 
involved. 

As to grades, it is altogether possible that different standards in 
this respect, or greater leniency in marking the girls, might be a factor. 
But it is quite improbable that so great a difference could be accounted 
for on this basis. For one thing their superiority in connection with 
classroom work, and as indicated by their grades, is almost as great 
on certain objective tests. For another, the difference is about the 
same whether we consider high school grades or college grades. More 
important, then, would seem to be the problem of motivation and the 
possible difference in type of school mastery. To make clearer how 
these factors may influence classroom results, a more detailed analysis 
seems desirable. 


TypEs OF EDUCATIONAL MASTERY AND TYPES OF 
CONNECTION-FORMING 


Mastery of classroom material may be merely one of memory or 
it may beoneofunderstanding. The definitions, phrases, and formulae, 
may be learned purely by rote, or they may be reacted to logically and 
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relationally. The connections formed in learning may be simply of 
the sensory-motor type, similar to those involved in the learning of 
the alphabet or the multiplication table, or they may be of the associa- 
tional or ideational variety, the words and phrases being reacted to as 
symbols, with awareness of meaning and implied content. In one 
case learning remains isolated, going very little beyond the verbaliza- 
tions as such; in the other case it is relational, that is, the new is tied 
up with and interpreted in terms of the old. The rote or sensory- 
motor connections are of the serial variety, the logical of the non- 
serial or relational variety. The test as to whether mastery is of one 
type or another would, accordingly, be found in whether or not the 
individual could restate (using other words) and illustrate the content. 
It should be understood, however, that while this distinction between 
rote and logical connections may be made, the latter are not in their 
origin different from the former, since connections which today give 
sense and seem logical, were once without meaning and quite unrelated. 
In other words, logical connections become so only through repetition 
and experience and are really only older sensory-motor connections. 

In the classroom it is obviously connection-forming of the logical 
type which is desirable. Merely to have learned to verbalize the 
material is not significant. What matters is whether or not the mate- 
rial has been so learned that it will function in the life of the individual 
and will be an aid to future adjustment. 

But to return to our problem. Is it possible that mastery of school 
subjects is more verbal and less logical on the part of the girls, and that 
this may account, in a measure at least, for the difference in retention? 
Several lines of evidence point to an affirmative answer: 

First, rote connections suffer more through disuse and lapse of time 
than do logical connections; and the girls, apparently forgetting more, 
may have depended more upon the former and less upon the latter. 
Second, the school subjects showing the greatest sex difference, so 
far as loss through lapse of time is concerned, were the subjects in 
which verbal mastery does not suffice (mathematics and the sciences), 
while those showing about equal loss were the linguistic subjects (Eng- 
lish and literature). Third, the outlook and interests of the girls are 
very different from those of the boys. Being less apt to have any 
use for many of the subjects taught they are less motivated in the 
direction of genuine understanding with respect to these subjects. 

No attempt is being made to see anything more than relative dif- 
ferences in the respects enumerated. We do not think that there is 
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anything more than a relative difference in the use made of logical 
and rote connections so far as the sexes are concerned. Nor do we 
believe that some school subjects involve only rote and verbal connec- 
tions and others only logical and relational learning. In fact, it would 
be quite impossible in many situations to determine which type pre- 
dominates. Yet, notwithstanding this practical difficulty, the dis- 
tinction between the two forms of learning and of connection-forming 
may nevertheless be made. One may go through a series of experiences, 
or make a series of responses, and later recall these in their proper 
sequence, but so long as the reactions are merely these, so long as the 
connections formed are only of the serial or rote variety, they are 
obviously different from those involved in relating the experience to the 
past, or seeing the new in terms of the old, as must necessarily be the 
case when we react with consciousness of meaning. 

This difference in the extent to which the sexes may make use of 
the two types of connection-forming is scarcely to be attributed to 
any native condition. If indeed it exists, we should regard it as a prod- 
uct of environment. As we have observed, the training and the out- 
look of the sexes are different. Most girls are looking forward to 
marriage and to a life in which things social are of chief importance. 
Their interest in high school and in college is already dominated by this 
outlook to the extent that interest in grades, which belongs to the 
general sphere of appearances and social approval, are regarded as more 
important than the possible bearing which the material learned may 
have on later life. Their strong social motivation makes them work 
harder for grades, and, to the extent that these frequently may be 
attained through pure verbal mastery, their work tends to remain on 
this level. The boys, on the other hand, are trained from an early 
age to think in terms of practical needs and practical adjustments. 
They are to be the bread-winners and are made to realize that their 
success is bound up with mastery of the physical rather than the social 
environment. Accordingly, grades mean less, understanding more. 
For the girl, the significance of a college course is apt to be social, for 
the boy, it is more apt to be vocational. 


BEARING UPON EDUCATIONAL METHODS 


To the extent. that the data presented in Tables I and II indicate 
that purely linguistic performance may be all that is necessary in order 
to pass courses and make good grades, these data may be said to con- 
stitute something of a criticism of our educational methods. There 
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can be no doubt but that many teachers identify their teaching problem 
with the verbal recital of a particular set of material, with the result 
that their instruction tends to deteriorate into a kind of ritual or drill in 
repeating certain phrases and definitions. To drift into such methods, 
and to feel that something has been accomplished when the student 
has learned to respond glibly to a fixed set of stimuli—this, no doubt, is 
the easy thing, and, too often, the only thing of which the teacher is 
capable. But the value of such instruction, of such ritual and routine, 
must be almost negligible. To have instilled in the individual a set of 
thinking habits, to have trained him in systematic and logical treat- 
ment of specific situations, this would seem to be more important. 
Mental habits of this kind may be inculcated by teaching the individ- 
ual—not definitions and formulas—but principles and the application 
of principles. If there must be drill it should be in understanding and 
not in verbalization. To the extent that this ideal is carried into effect 
the real aim of teaching may be attained. We shall be training the 
individual not so much to meet the ordinary situation in the conven- 
tional way, as to meet the extraordinary situation in an unusual way. 


SUMMARY 


1. Upon retesting college freshmen for subject-matter covered in 
high school the girls, by comparison with high school grades, were 
found to make a poorer showing than the boys. 

2. While forty-seven per cent of the girls in the unselected group 
(the entire freshman class) were in the upper quartile in high school 
grades, only twenty-seven per cent were in the upper quartile on the 
retest. 

3. Of the selected group (boys and girls paired for scholastic 
records) only half as many girls as boys were in the upper quartile on 
the retest. 

4. In the case of boys and girls matched for intelligence the girls 
make higher grades than the boys. 

5. Whether we consider high school records or retest records, the 
girls do relatively better in English and the languages, the boys 
better in the sciences and mathematics. 

6. The apparent difference between the sexes in amount of loss 
through lapse of time is greatest in mathematics and the sciences, least 
in the languages and English. 

Since the difference between the boys and the girls in scholastic and 
retest records seems too great to be due to different standards in mark- 
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ing boys and girls, we believe that at least a partial explanation may be 
found in differences in type of school mastery and differences in interest 
and motivation. It is possible that the mastery of the girls is more 
verbal or rote and less logical than that of the boys. This is supported 
by the fact that purely rote and sensory-motor connections suffer more 
through lapse of time than do logical connections, and by the fact that 
the difference in loss is greatest in those subjects in which purely 
linguistic mastery is inadequate. Interests and outlook of the girls 
being different, they are less motivated in the direction of logical 
mastery in many school subjects. Moreover, linguistic mastery of 
subject-matter is frequently all that is necessary in order to secure good 
grades. 
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NEW EVIDENCE OF THE SUPERIOR RETENTION 
OF TYPEWRITING TO THAT OF SUBSTITUTION 


FRANK N. FREEMAN AND ETHEL M. ABERNETHY 


University of Chicago 


In a previous number of this Journal the authors! reported an 
experiment in which a comparison was made between the retention 
of typewriting and that of substitution. In learning typewriting 
the subjects, none of whom could use the machine, “learned to write 
a short paragraph with keys covered with blank caps and a drawing 
of the keyboard placed in view of the learner.’”’ The substitution 
“consisted in translating the same paragraph into digits, following 
a key containing the letters of the alphabet and corresponding numbers 
placed before the learner.” Thus the subject-matter in the two 
cases, and, possibly, the number of associations, were the same. 
The chief difference was that one set of associations involved overt 
movements whereas the other was ideational. The schedule in both 
cases was the same, consisting in first learning in spaced periods, 
relearning after two weeks and second relearning after eight more 
weeks. 

The chief finding of this experiment was that, first, the retention 
of the associations in typewriting and in substitution was equal on 
the first relearning after two weeks, but, second, the retention was 
very different after eight more weeks. There was an actual gain in 
errors, trials and time between the first and second relearning in type- 
writing while there was a marked loss in each item in substitution. 

McGeoch’ has raised the question whether the superior retention 
in typewriting on the second relearning may be due to the effect of 
the first relearning. He says: 


The authors conclude that the presence of overt movements facilitates reten- 
tion over a long period, although it does not measurably do so over a shorter one. 
This experiment is a highly important attack upon the problem, although unfor- 
tunately the experimental conditions render uncertain an interpretation of the 
results after the ten weeks period. The groups which relearned after ten weeks 
had also relearned after two weeks. The fact of the difference in saving after ten 
weeks may be, as Freeman and Abernethy think, a function of the greater time 





1 Freeman, Frank N. and Abernethy, Ethel M.: Comparative Retention of 
Typewriting and of Substitution with Analogous Material. The Journal of 
Educational Psychology, Vol. XXI, 1930, pp. 639-647. 

*McGeoch, John A.: The Acquisition of Skill. The Psychological Bulletin, 
Vol. XXVIII 1931, p. 456. 
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interval, or it may be that a relearning, which constitutes distributed practice, 
after two weeks is more efficacious for fixating typewriting for later retention 
than it is in fixating substitution. 


While one might argue on theoretical grounds that the first relearn- 
ing, which constitutes a review, would be expected to benefit the 
less permanent type of learning to a greater extent than the more 
permanent type, the authors preferred to repeat the experiment 
in order to clear up this question in interpretation rather than rest 
the case on theoretical considerations. Accordingly a second experi- 
ment was made, in all essential respects like the first, except that 
the first relearning was omitted. The relearning period was introduced 
ten weeks after the learning. 

The experiment was conducted, as before, in the psychological 
laboratory of Queens College, Charlotte, North Carolina, by the 
junior author.' The subjects were of the same type as before, women 
college students, selected on the same basis. The same material 
was used and the same conditions observed. Both groups were 
taught individually, in fifty minute periods, one period a day until 
the learning was completed. In order to duplicate the automatic 
check on errors in typewriting, the learners in the substitution group 
were notified of each error as it occurred. 

The results are presented in three tables corresponding to the 
tables in the original study. Table I gives the average scores on 
learning and relearning of both groups. It is evident that there is a 


TaBLE I.—TuHE AVERAGE ScorEs IN LEARNING AND RELEARNING TEN WEEKS 
AFTER IN TYPEWRITING AND IN SUBSTITUTION. TWENTY-FIVE SUBJECTS IN 
Eacu Group 











Errors Repetitions Time in seconds 
Group Measure 
Mean SD Mean SD Mean SD 
- - Learning...... 161.08 + 19. 90|147.55|22.12+1.10|8. 18/6763. 72 + 376.97|2794.47 
— ing { Relearning....| 44.88+ 5.20) 38.57/11.80+0.62/4.56/2822.32+116.62| 864.47 
iii af Learning...... 100.16+ 5.38] 39.90/10.60+ .39|2.88/2803.64 + 138. 20/1024.50 
suomi Relearning....| 96.04+ 3.79] 28.10] 9.24+ .29/2.14/1987.48+ 72.51) 537.50 





























1 Miss Thelma Stone, a teacher of typewriting, assisted the experimenter with 
the typewriting group, as in the first experiment. 
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very large saving in all three measures in the case of typewriting, 
and a relatively small saving in substitution. As shown in Table II, 
the gains are statistically highly reliable in the case of typewriting 
but not in the case of substitution, except in time. The difference 


TaBLE II].—DiFFERENCES BETWEEN THE MEANS, LEARNING AND RELEARNING 























Substitution Typewriting 

D PE of D D PE of D 
id chee 0-00 060d 64s a eeeee Saas —4.12 6.58 —116.20 | 20.57 
ind og ba-e' 6 av cib he deed —1.36 .48 —10.32 1.26 
po a re —816.16 | 156.07 | —3941.40 | 394.60 





is presented in directly comparable terms in the saving scores, shown 
in Table II]. The saving is expressed as a percentage of the scores 
on first learning. 

It is obvious that the saving on relearning, and hence the perma- 
nence, is very large in the case of typewriting and much smaller in 


Tas_e III.—Savina Scores BasEp ON MEANS OF TABLE I 

















Group Errors | Repetitions Time 
iis: <xtinn was aetaes bk we 72.14 46.65 58 .27 
EN ask chcecaeecsnee kbaeanas 4.11 12.83 29.11 





the case of substitution. This agrees with the results of the first 
experiment. In the present case, however, the difference is much 
more pronounced, as may be seen by comparing the saving scores 
in this experiment with those in the previous experiment, reproduced 


TaBLeE IV.—Savine Scores IN THE PREVIOUS EXPERIMENT. BASED ON 
CoMPARISON OF LEARNING WITH RELEARNING TEN WEEKS AFTER 











Group Errors | Revetitions Time 
Ma ee 91.20 74.88 80.54 
se ek Bena koe 76.85 52.23 65.86 











in Table IV. These saving scores are based on relearning ten weeks 
after the first learning. This indicates that the presence of the first 
relearning, instead of being responsible for the superior retention 
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of typewriting actually reduced it. Comparatively early reviews 
are necessary in the case of ideational learning, whereas they may 
be dispensed with without entailing very great loss in the case of 
motor learning. The second experiment strengthens the conclusion 
that, for activities comparable to those studied in this experiment, 
relatively simple associations which involve overt movements are 
more permanent than those which are more largely ideational in 
character. 
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THE DIFFICULTY OF A TEST AND ITS DIAGNOSTIC 
VALUE* 


THELMA GWINN THURSTONE 


This study was made to answer a problem in mental test theory, 
namely—how is the diagnostic value of a test affected by the degree of 
difficulty of the test questions? The separate questions in group 
intelligence tests are usually fairly simple, and a high percentage of the 
subjects who attempt them answer them correctly. Tests are made 
harder by decreasing the time allowed to work on them, rather than 
by increasing the difficulty of the separate test questions. The low 
correlations sometimes obtained between intelligence test scores and 
college scholarship may be due, in part, to the fact that the tests are 
too easy and should be made harder. Between the extremes of zero 
scores and perfect scores, there is probably no way to determine ration- 
ally how diagnostic the tests of various degrees of difficulty may be. 
Is a test in which the average score is fifty per cent more discriminative 
than a test in which the average score is seventy-five per cent, and how 
does the diagnostic value of a test on which the average score is twenty- 
five per cent compare with the diagnostic value of easier tests? Is 
there an optimal degree of difficulty of a test which it is possible to 
determine, and if so, what is it? These are the questions which 
suggested the present study and which the experiments have been 
planned to answer. 

The study was undertaken largely as a statistical problem, but it 
is hoped that the results may in some way contribute to the theory of 
psychological measurement. 

In order to determine experimentally what relation exists between 
the degree of difficulty of a test and its diagnostic value, it is necessary 
to have as reliable a measure as possible of some particular ability of a 
group of subjects—the criterion, and to know the test performance of 
each subject on tests of various degrees of difficulty and to correlate 
scores in each of these tests with the criterion. 

It is of course impossible to say, before a test is given, exactly 
how hard it will be for the group that takes it, although thorough 
standardizations of some test material are available and can be made 
use of in constructing examinations. The material used in this study 
consisted of one thousand spelling words. I had no greater interest 





* From the Psychological Laboratory, the University of Chicago. 
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in spelling tests than in tests of other abilities because my problem was 
one of test theory, but spelling tests were selected because of the large 
number of spelling scales available and because of the greate ase of 
collecting, giving, and scoring a large number of test questions. The 
words used ranged in difficulty from very easy words which all the 
subjects could spell to very hard words which none could spell correctly. 

Even with the large number of standardizations of spelling words 
such as those given in the Iowa Spelling Scales, it was not possible to 
construct the tests wholly from such sources on account of the small 
number of difficult words included in those lists. It was necessary to 
select a large number of words which in my judgment would be difficult 
for my subjects to spell. Fortunately, when the papers had all been 
scored, I found that I had a fairly equal distribution of words at various 
levels of difficulty. Table I gives the actual distribution. 

The criterion of spelling ability which has been used in this study 
is the total score in the one hundred words making up the tests. These 
scores ranged from one hundred sixty-two to eight hundred twenty- 
nine, and in the opinion of the writer constitute a reliable criterion. 
There may be some objection to using this score as a criterion since it 
contains the scores in the separate tests which were later correlated 
with it. However, the number of words in these tests was usually only 
twenty-five and never more than fifty, and therefore could not have 
seriously affected the total score. If intelligence test scores or arith- 
metic test scores or some other criterion had been used, the correlations 
would probably have been lower, and these lower correlations would 
have been due to the fact that spelling tests are not as diagnostic of 
intelligence or of arithmetical ability, rather than to any great differ- 
ence in the reliability of the criterion. 

The tests were given to one hundred sixth-grade children in four 
sixth-grade rooms of the Baltimore, Maryland, public schools in the 
spring of 1924. About fifty other children took a part of the test, but 
no records were used for pupils who were absent and did not take all ten 
tests. 

It would hardly be possible to give a subject one thousand spelling 
words at one sitting on account of the time required and the fatigue 


which would result. For this reason the one thousand words were 


arranged in ten tests of one hundred words each, the selection being 
made in random order so as to avoid making a test either too hard or 
too easy as compared with the others. Even had we been able to know 
beforehand the exact number of subjects who would be able to spell 
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each word correctly, it would not have been desirable to arrange the 
words in order according to their difficulty. This arrangement is 
necess. -¢ for the statistical solution of the problem, but must be made 
after the subjects have completed their work. If all the hard words 
were given to the subjects at one time they would be too discouraged to 
make an effort. 

The ten tests were given one each day for ten consecutive school 
days, from May 12 to May 23 inclusive in the spring of 1924. The 
regular teachers administered the tests immediately after school was 
assembled in the morning. Dr. J. L. Stenquist, Director of Research 
in the Baltimore Public Schools, held a conference with the four 
teachers who assisted and explained the purpose of the tests. He was 
very successful in securing their cooperation and the teachers were most 
helpful. They followed a memorandum giving complete directions for 
administering the tests so that the tests were always given under the 
same conditions. There was no practice on the words during the 
experiment. 

All the scoring was done by the author. The children’s papers 
were marked by drawing a line before each misspelled word. The total 
number of correctly spelled words was written at the top of each blank. 
Special cards were prepared for recording the scores. One card was 
made for each word, showing the response of each child, whether right 
or wrong, and the total number of children who spelled it correctly. 
The cards thus contained exactly the same data as the children’s 
original papers except that on the papers the responses of an indi- 
vidual were grouped together, while on the cards the responses to a 
single word were grouped together. This arrangement was necessary 
since we wanted to be able to construct tests of any specified degree of 
difficulty. 

The data on the cards were then transferred to Powers cards and 
arranged so that it was possible by means of the electric sorting machine 
to answer questions such as: 

What score would subject number one make on a test in which all 
the words vary in difficulty from twenty per cent to twenty-nine per 
cent error? And, 

What score would subject number three make on a test in which all 
the words vary in difficulty from thirty per cent to thirty-four per cent 
error? 

The words were grouped in twelve piles according to the percentage 
of correct spellings—zero per cent, ten piles with a range of ten per cent 
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TaBLE I.—DistTrRIsvuTION or Worps AccorRDING TO DIFFICULTY 


PERCENTAGE OF 
Correct SPELLINGS NuMBER OF WorpDSs 





0 22 
1- 4 49 
5- 9 44 
10-14 39 
15-19 46 
20-24 48 
25-29 44 
30-34 45 
35-39 45 
40-44 44 
45-49 51 
50-54 48 
55-59 49 
60-64 38 
65-69 41 
70-74 32 
75-79 45 
80-84 49 
85-89 75 
90-94 71 
95-99 61 
100 7 
, ae 993* 


* Seven homonyms were discovered in scoring the papers. These were omitted, 
leaving the total number of words nine hundred ninety-three. 


each, and one hundred per cent. Each student’s scores on the words in 
the first and last groups were known to be zero and seven respectively. 
Fifty cards were selected at random from each of the groups of words 
and by the use of the electric sorting machine, each student’s score in 
each test of fifty words was calculated. 

We then had a measure of the spelling ability of each child in his 
score on nine hundred ninety-three words and twelve separate test 
scores for each child. The twelve tests were of known degrees of 
difficulty, and comparison of the relations between the criterion and the 
separate test scores gave an answer to the problem we had set. Pear- 
son coefficients of correlation were calculated for each of the tests and 
the criterion. These coefficients are given in Table II. 

Inspection of Table II shows that neither the extremely easy tests 
nor the extremely hard tests are the most diagnostic measures of spell- 
ing ability. The most diagnostic tests are found in the range of 
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medium difficulty. This relation is shown even more clearly in Chart 1 
than in the table. The most striking characteristic of the curve is its 
steepness at each end. Since we obtain zero correlation between the 
criterion and a test which is so hard that all subjects make zero per 
cent on it, it is quite surprising that scores on a test only slightly easier, 
one in which scores on the separate items run from one per cent to nine 
per cent, should be so closely correlated with the criterion as indicated 
by a correlation coefficient of .687. It is also unexpected that scores 
in a test in which the percentages of correct spellings on separate 
words vary only from ninety per cent to ninety-nine per cent should 
bear such a close relation to spelling ability as shown by the correlation 
coefficient of .677. 


TaBLE II.—PEARSON COEFFICIENTS OF CORRELATION BETWEEN TESTS AND 
ToTaL Scores 


Dirricu.ty or Test r 

PERCENTAGE OF CorREcT SPELLING CoEFrFrICcIENT OF CORRELATION 

0 .000 

1- 9 .687 

10-19 .778 

20-29 .926 

30-39 .978 

40-49 .974 

50-59 .934 

60-69 .947 

70-79 . 906 

80-89 .826 

90-99 .677 

100- .000 


A better picture of the relation between the difficulty of a test and 
its diagnostic value is obtained if the range of difficulty of the items 
making up the tests is restricted. It was for this purpose that a second 
experiment was carried out in which the tests consisted of words rang- 
ing in difficulty only five per cent. Including the tests at zero per cent 
and one hundred per cent levels of difficulty, there are twenty-two 
tests in this series, each test consisting of twenty-five words. 

Again Pearson coefficients of correlation were calculated to show 
the relation between separate test scores and spelling ability. The 
results of these calculations are given in Table III, and are also repre- 
sented graphically in the Chart. 

These results again show a closer relation than might have been 
expected between the criterion of spelling ability and scores in easy 
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tests and scores in hard tests. The two experiments show a slight 
difference in the level of difficulty at which the highest correlations are 
found. This difference, however, is so small that it is probably due to 
errors of sampling rather than to any significant dependence of the 
correlation upon the range of difficulty in the test items. 

Inspection of the correlation tables indicated that the relation 
was really closer than the correlation coefficients showed because of the 
decided non-linearity of the regression, criterion on test, for the easy 
tests and for the hard tests. In the middle ranges of difficulty the 
regression of criterion on test scores is rectilinear, but at either end of 
the range, the regression is clearly curvilinear. This result could have 
been predicted, although the extent of the curvilinearity could not have 
been foretold. In an extremely hard test we should expect no differ- 
entiation in ability among those who make low scores, but we should 
expect that only a few very good students would succeed in making 
high scores on these tests. Similarly, if we give a very easy test to a 
group of children, we should expect the test to have diagnostic value 
only in the lower range of scores. In the case of our spelling words we 
should say that only a very poor student would fail to spell the very 
easy words. We might expect, however, that a test of average 
difficulty would have approximately the same diagnostic value through- 
out the whole range of scores, and hence, that the regression, criterion 
on test, for such a test would be rectilinear. The correlation ratio is a 
statistical measure of the closeness of relation between variables and is 
used instead of the correlation coefficient when the regression is non- 
linear. Correlation ratios were computed for all the tests used in the 
second experiment, and the results are shown in Table III. 

The evidence from the correlation ratios is conclusive that tests 
of medium difficulty are more diagnostic than tests which are extremely 
hard or extremely easy. It is very surprising, however, that tests 
which are so very easy or so very hard are as diagnostic as this experi- 
ment has shown them to be. 


CONCLUSIONS 


If the results of these experiments are applicable only to sixth- 
grade children and spelling tests, the evidence is very clear that there is 
a definite relationship between the degree of difficulty of a test and its 
diagnostic value. Up to a percentage of error equal to about fifty per 
cent the diagnostic value of a test increases as its difficulty increases but 
from that point on, the diagnostic value becomes lower as the test 
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becomes harder. For the solution of our problem in test theory, the 
correlation ratios are more important, but the practical conclusions 
drawn are based rather on the correlation coefficients because a linear 
regression is so much easier to use for purposes of prediction. 

Twenty-five words seem to be enough to give a reliable differentia- 
tion between the abilities of a group if the words are of the proper 
difficulty. It is recommended that words which can be spelled 
correctly by thirty per cent to seventy per cent of the students should 
be used. 

It seems probable that the results of these experiments have a wider 
significance than that just discussed. It would of course be desirable 


Relation between difficulty of a test 
and ite validity 


Correlation between test and criterion 





Diffioulty of test in terms of 
per cent of right answers 


to carry out similar experiments using different test material and sub- 
jects of a different age and a criterion entirely independent of the tests. 
It seems likely, however, that in the preparation of examination mate- 
rial, teachers should be more careful to include harder questions than 
they are accustomed to use and not try to make their tests easy enough 
for a large percentage of their students to make grades of eighty per 
cent to one hundred per cent. It seems fairly safe to guess that better 
differentiation between the abilities of a group can be obtained with a 
test in which the average percentage of error is about fifty per cent and 
in which the difficulty of the separate questions ranges from about 
thirty per cent to seventy per cent successes than can be obtained from 
a test in which the percentage of error is only from zero per cent to 
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TABLE III.—CoeEFFICIENTS OF CORRELATION AND CORRELATION RATIOS FOR TEstTs 
oF Known DIFFICULTY 











Difficulty r n 
Percentage of Coefficient of Correlation 
correct answers correlation ratio 
0 .000 .000 
1- 4 . 540 .720 
5- 9 .705 .788 
10-14 .812 .859 
15-19 .809 .892 
20-24 . 892 .929 
25-29 .905 .925 
30-34 .924 .944 
35-39 .917 .948 
40-44 .931 .949 
45-49 .955 . 960 
50-54 .952 .958 
55-59 .931 .940 
60-64 .912 .937 
65-69 .895 .910 
70-74 .875 .904 
75-79 .818 .891 
80-84 .837 . 869 
85-89 ~ 800 . 863 
90-94 .641 .721 
95-99 .499 . 528 
100 .000 .000 











twenty per cent as is at present the case in most school examinations 
and mental tests. 

The results of these experiments agree so well with common sense, 
that about fifty per cent error should be the optimal difficulty of a test, 
that the question may be asked whether this result could not have been 
predicted rationally. The writer has given quite a great deal of 
thought to this problem and believes that it has no rational solution 
except on the basis of assumptions which she does not feel justified in 
making. 

Another question which may be raised is whether a test composed 
of items from a wider range of difficulty might not be more diagnostic 
than a test in which the range of difficulty is restricted. Such a study 
had been planned, but the distribution of correlation coefficients at 
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various levels of difficulty seemed to indicate that no significant diff 

ences would be obtained unless words extremely hard or extremel vl 
were included with words of medium difficulty. The inter-test aah. 
tions have not been computed, but they would be high, and it rf 
probable that a test composed of items selected so as to re resent : 
—_ ing v rpg would be less diagnostic than a test tia “vate 

a : 
. . o-teeng ed at the level of optimal correlation found in these 
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EVALUATION OF METHODS OF EVALUATING TEST 
ITEMS 


THEO. F. LENTZ, JR. 
Washington University, St. Louis, Mo. 


AND 


BERTHA HIRSHSTEIN 
New York University 


AND 


F. H. FINCH 


University of Minnesota 


The importance of technique for selecting best items for inclusion in 
a test becomes greater and more apparent as measurement research 
increases. A chain is reputedly only as strong as its weakest link; 
while the weakest item in a test is not the sole determining factor of 
the test’s strength, the analogy is suggestive, and we find it highly 
important to eliminate the weakest items. How best to identify these 
weak links is, however, a more difficult matter. 

Two types of methods of test item selection have been advanced, 
the theoretical and judgmental on the one hand, and the statistical 
on the other. This paper is concerned with the statistical, that 
is, the correlating of the various items, with some objective criterion, 
which may be the scores on the test as a whole, or scores obtained 
independently of the test under construction. In the experiments 
herein described, the purpose has been to increase reliability, yet it 
may be observed in passing that all these methods can, with slight or no 
modification, be utilized for evaluating items for validity. 

Several long methods have been formulated which can be used 
for this statistical selection or evaluation of items, that is, for 
obtaining the correlation of test items with the test as a whole. 
In dealing with the relationship between a dichotomous variable 
and one of a larger number of categories, bi-serial r is probably 
the - most commonly employed, and while it is applicable in the 
situation here being considered the amount of work it requires limits 
its usefulness. 
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Seven short methods not found in the standard books on statistics 
have recently been considered,* and four of them experimentally 
evaluated. These four methods may be referred to as 


1. The Upper vs. Lower Third Method (U.L.). 

2. The Overlapping Method (Vincent) (O.L.). 

3. The McCall Method (McC). 

4. The Summation of Agreements Method (Lentz) (S.Ag.). 


The materials used in testing these methods consisted of the reac- 
tions of 211 college and university students to a conservatism test 
which is being developed by Lentz.t This test is made up of one 
hundred fifty statements of opinion, with which the subjects being 
tested are asked to agree or disagree. Eighty-eight of these subjects 
were students in general and educational psychology at Washington 
University during the regular school years of 1928-1929, and 1929- 
1930. Forty-eight were students in the 1929 Summer Session of the 
same University. Thirty-eight were students at Blackburn College in 
1928-1929. Thirty-seven were advanced students in psychology at 
Colgate, 1928-1929. Each student was given a score equal to the 
number of items or opinions to which he reacted conservatively. 

The method used for evaluating these methods is as follows: By 
each of the four methods mentioned above, one hundred best items 
were selected out of the one hundred fifty of the original test. Each 
set of one hundred items was then tested for reliability. The method 
giving the one hundred items of highest reliability is then to be 
adjudged the best method. In other words, the best method is the one 
giving the greatest increase in the reliability of the test. 


EXPERIMENTING WITH THE U.L. METHOD 


The seventy persons with the highest total scores, highest third, and 
the seventy persons with lowest scores, lowest third, were selected as 
the criterion group. The number of conservative reactions on each 
item was counted separately for each group. A conservative reaction 
on this test is considered the same as a pass on an ordinary test since the 
total number of conservative reactions constitutes the criterion score 





* Finch, F. H.: Techniques for Evaluating Test Items. 

Hirshstein, Bertha T.: Evaluation of Methods of Evaluating Character Test 
Items. Unpublished Master’s Theses, Washington University, St. Louis, Mo. 
June, 1930. 

t Lentz, Theodore F. Jr.: Utilizing Opinion in the Measurement of Character. 
New Journal of Social Psychology. Nov., 1930. 


i TI 
hee ee is 


fal 


y 
PG, 
by . 
eS 
A \j 


ak 








346 The Journal of Educational Psychology 


The evaluation of each item is then, the difference between the number 
of passes for the upper group and the number of passes for the 
lower group. For example, we might find that fifty individuals in 
our group with the highest criterion (conservatism) scores, mark a 
particular item conservatively, whereas thirty-five in our lower group, 
(lowest conservatism scores) mark the same item conservatively. The 
evaluation for that particular item then is fifty minus thirty-five or 
fifteen. So by the U.L. method, those items with the greatest number 
of passes in the upper group in excess of the lower are considered the 
best items, since they discriminate most between the two groups. 


EXPERIMENTING WITH THE O.L. METHOD 


This method has been studied by Vincent.* Here the total con- 
servatism score for the individual is again considered the criterion 
score. It is necessary in this method to find for each item the mean 
(or median) of the criterion scores (scores on the test as a whole) for all 
the persons who mark the item conservatively. Then the criterion 
scores of the group who failed the item, 7.e., marked it radically, are 
studied. The percentage of persons in this failing or radical group, 
whose criterion scores equal or exceed the mean for the passing group is 
calculated. If the mean of the criterion scores of the passing group is 
found to be 72.56 and twenty-five of the seventy-five persons who 
failed the item are found to have criterion scores higher than 72.56, the 
evaluation of the item will be 3314. In this method the lower the 
percentage of overlapping, the better the item. 


EXPERIMENTING WITH McCatu’s MetTuHop 


McCall’s method{ resembling in some respects the Bi-serial correla- 
tion, reduced to apply to items scored dichotomously, may be formu- 
lated as follows: 
where 
(M; — M2)Ni X Not 

N 


C is the coefficient of value for the item. 


C = 








* Vincent, Leona E.: A Study of Intelligence Test Elements. Teachers College 
Contribution to Education. No. 152, 1924. 

+t McCall, Wm. A., and Students: Construction of Multi-Mental Scale. Teach- 
ers College Record. XXVIII, 394-415. 

t McCall is not to be held responsible for this use of his method. He and his 


students developed their formula for use with five-way items of the Multi-Mental 
Scale. 
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M, is the mean total score of those persons making a positive 
reaction to the item (7.e., passing the item, or in the case of the test 
here used, making a conservative reaction). 

M; is the mean total of those persons making a negative reaction to 
the item, or in the present test, a radical reaction. 

N, is the number of persons reacting positively to the item. 

N- is the number of persons reacting negatively to the item. 

N is the total number of responses to the item. 


EXPERIMENTING WITH THE LENTZ MetHop. (SAa) 


In the Summation of Agreements method proposed by Lentz, 
the item being evaluated is credited, in the case of each subject, 
with the number of responses to all items which agree with that sub- 
ject’s response to the item in question. The mean of the total agreeing 
scores constitutes the measure of the item’s value. The formula for 
this coefficient is 


C= 


Where C is the coefficient. 

Na, is the total number of responses by the first subject 
which agree with that subject’s response to the item being 
considered. 

Naz is the total number of responses by the second subject 
which agree with that subject’s response to the item being 
considered. 

N is the total number of persons responding to the item. 

In other words, an item is credited to the degree that it agrees 
with the other items of the test in identifying the subject as conserva- 
tive or radical. To illustrate: Suppose item number one is reacted to 
conservatively (passed) by subject number one, and subject number 
one reacts conservatively to (or passes) eighty items in all, then item 
one is credited with eighty points. If this item is reacted to radically 
by the second subject, and the second subject reacts radically to 
ninety items in all, then item one is credited with ninety more points 
and so on for item one for all the subjects. In similar manner each 
item is evaluated. * 

Having evaluated each of the items by each of the methods just 
described, the next step was to list the best one hundred items by each 


Na; + Naz + Na;+ --- Na, 
N 








* A preliminary step in the application of this method was to balance the test 
as a whole so that the average score of all cases used amounts to one-half the 
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method. This gave four sets of items, or four tests, each consisting 
of one hundred items. It must be remembered that these four tests 
were not mutually exclusive, but were overlapping, since each set of 
one hundred was chosen from the same original set of one hundred 
fifty. The overlapping among sets amounted to seventy-eight per 
cent, that is, seventy-eight items were common to all four sets. The 
four sets of tests so chosen were then tested for reliability, on the 
basis of the reactions of the two hundred eleven student subjects. 
Using the odd vs. even Spearman-Brown technique, we get the follow- 
ing reliability coefficients, estimated for one hundred fifty items of 
value equivalent to the one hundred chosen. 


ee Cre ee a LS a oa a be Ome we 91 
Po a ated ls bs bebe e ken eeee we eheane .89 
RS are ko. is aus wee bi caaeeeauee .86 
te ie had bees bke eee nena akekaeehs .90 

i ene ue wes cada as be Caee oneness tae bene .79 


The one hundred items selected by the U.L. method were then 
administered to a new group of some sixty students with a resulting 
estimated reliability coefficient of .89 for one hundred items and .93 
for one hundred fifty items both of which are in marked contrast to 
the coefficient of .79 for one hundred fifty items of the original test. 

In order to get at the validity of the methods in another way the 
inter-correlations between methods were computed. Averaging the 
correlations of each method with each of the other three, we have 
the following: 


RAT ie ee Mae, Mem ee a | ee Ney aU ONE RLS MAEMO I .79 
i ab octane int ot th by ics ek ape ach duced id a te watson ade iter ee .79 
CE .~ cece sus se. Grubee le Gus gues sue penal eek we .76 
BO oa hice Sac er ascata yk Para aia ca tate Aig han oot ta Sige oles cl ls .80 


The time necessary for employing any of these methods must also 
be taken into consideration. U.L. was found to require least time; SAg 
was next, while the other two were still longer. If it requires two 
hundred hours to apply the McCall method, it would probably require 
one hundred seventy-five for the O.L., one hundred fifty for SAg, and 
for the U.L. fifty to one hundred hours depending upon the elaborate- 





number of items. This is one way to avoid certain spurious trends which seem to 
inhere in the method when the test is not thus balanced. This balance of the test 
was here achieved by dropping the fifteen most conservative items. When applied 
to a test thus balanced, the SAg method is equivalent to the A.S.C.I.D. method 
alter described. 
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ness of the tabular layout for other purposes. Curiously enough, the 
more effective the method, the simpler and the less expensive. Taking 
all factors into account our conclusion is that the U.L. method is 
preferable and the SAg method follows as a poor second—though 
superior to the other two. The principle of the U.L. method is sup- 
ported by this experiment. Whether or not it is best to take the 
upper and lower thirds, or upper and lower fifths, or some other frac- 
tion remains to be demonstrated. This may depend upon the ease 
with which additional cases are obtainable. 

Three short methods not here evaluated are, Clark, A.S.C.I.D., 
and Balance. 

1. Clark’s' formula for evaluating test items is: 


P-D 
Item value = 7— 


where 


D is the percentage of the group failing to answer the item; it is 
the difficulty of the item. 

P is the percentage of the criterion group who failed to answer the 
item. The criterion group for any one item is the D percentage of the 
class who have the lowest scores on the total test. To illustrate: 
Suppose forty per cent of all subjects fail the item and eighty per cent 
of the lowest forty per cent of the subjects fail the item. Substitute 
in the formula and we have 


.80 — .40 
1 — .40 


2. In the A.S.C.I.D. method (devised by Lentz), the algebraic 
summation of the consistency-inconsistency deviations, the evalua- 
tions are found by totaling algebraically the consistency-inconsistency 
deviations from the mean. The difference between the individual 
conservatism score and the mean of all conservatism scores is found 
for each subject. For each item on which a conservative person (one 
whose score is above the mean) answers conservatively, the item is 
given this difference as a positive value; when a conservative person 
answers radically it becomes a negative value. When a radical person 
(one whose score is below the mean) answers radically, the difference 
is considered as positive; when a radical person answers conservatively, 
it is negative. Let us assume that, for instance, the mean score of 
the group is seventy-two. Subject A has a score of seventy-eight and 


= .67 








1 Clark, E. L.: A Method of Evaluating the Units of a Test. Journal Educa 
tional Psychology, Vol. XIX,1928, pp. 263-265. 
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subject B has a score of sixty-eight. If to the first item A responds 
conservatively, the item is credited with a plus six; if B answers the 
same item radically, it is given a plus four. On the second item the 
reactions are reversed; A answering radically and B conservatively 
credits item two with a minus six and a minus four. The total 
evaluation of the item is the algebraic sum of these differences. This 
method ranks very differently with a shifting mean. Apparently it 
overemphasizes the function of the mean score of all the subjects in 
the value of the item. 

3. Balance.—A very simple comparison of items is made by finding 
the balance. One might think of a balanced item in exactly the same 
way in which he thinks of a balanced scale. If there are the same 
weights on both sides of the scale, it is said to be balanced. In the 
case of the test item, if there are the same number of conservative as 
radical responses, or the same number of passes as failures, the item 
is said to have perfect balance. Ordinarily when we speak of balance 
we mean the proximity of approach to perfect balance. Balance is 
always expressed as half or less than half the total number of cases 
If the number of passes is less than half the total number of subjects. 
it is taken as the measure of balance. If the number of passes is more 
than half the number of subjects, then the number of failures is taken 
as the measure of balance. 

This minority number can be reduced to percentage in which case 
fifty per cent would represent perfect balance. This method of eval- 
uating items rests on the assumption that other things being equal that 
item is better which more nearly divides the group equally.! (It can 
be conceived as follows: An item passed by one and failed by ninety- 
nine makes ninety-nine differentiations, that is one times ninety-nine, 
whereas the item passed by fifty makes 2500 differentiations, that is, 
fifty times fifty.) This hypothesis does not hold for many types of 
tests of skill and knowledge, especially power tests, but appears appli- 
cable to some educational tests and especially applicable to some types 
of character tests. We find appreciable correlation between item 
Balance method and U.L. method, of evaluation (.48) but zero correla- 
tion with the Overlapping method. The fact of balance is suggested 
as a preliminary inexpensive method of sizing up items and as an aid 
in understanding why certain items fail to measure up. Slight changes 
in wording may appreciably affect the balance of an item. 





1 Symonds, P. M.: ‘‘Choice of Items for a Test on Basis of Difficulty.”’ Journal 
of Educational Psychology, Vol. XX, Oct., 1929, pp. 481-493. 
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THE INTELLIGENCE OF EAST TENNESSEE 
MOUNTAIN CHILDREN 


L. R. WHEELER 


State Teachers College, Johnson City, Tennessee 


The Southern Highlands have presented many interesting romances 
and legends which have painted the mountaineer as a unique individ- 
ual,—self-satisfied, extremely independent, superstitious, and above 
all backward. Sociological and religious studies have added to this 
picture much subjective evidence to strengthen the characterization 
of the Highlanders, and have endeavored to point out many explana- 
tions and causes for their peculiar personalities, habits, speech, songs, 
feuds, economic status, health and happiness. It is now possible to 
study these very interesting people with objective measures, thus add- 
ing another interpretation to the character of the mountaineer. By 
objectively measuring intelligence, educational status, physical devel- 
opment, and special abilities, much can be learned about the so called 
“backward”? mountaineers. This method of study may lead to new 
interpretations of these people, and so be a basis for eliminating many 
false opinions and odd conceptions of this section of our country. 

About five years ago,' Hirsch made an experimental study of the 
East Kentucky mountaineers? primarily to show the influences of 
heredity and environment. He gave intelligence tests to one thousand 
nine hundred forty-five children in representative mountain schools 
and studied the results. The most important single discovery that he 
made was the low IQ rating of the children, the average on the total 
group being seventy-nine with a Standard Deviation of 15.8. Overa 
thousand of these subjects were given educational tests, and with 
the exception of the seventh and ninth grades the EQ (Educational 
Quotient) average was several points higher than the IQ average. 

Hirsch made comparisons among différent chronological age groups 
as to the increase and decrease of intelligence rating, and found that 
the extreme difference in IQ average for any age as we ascend the 
chronological scale between age groups is twelve points. The differ- 
ence between the average of the seven-year old age group and the 





1 Date of gathering data not mentioned in Psychology Monograph Vol. III, 
No. 3. 
* Hirsch, M. D. M.: An Experimental Study of the East Kentucky Moun- 
taineers. Genetic Monographs, Vol. III, No. 3, March 1928. 
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average for the entire group is about six points. The difference 
between the average for the seven year age group and the average in an 
urban community would be about sixteen points, thus he comes to a 
highly speculative conclusion that ‘‘environment can only account for a 
little more than one-third of the deficiency in IQ,’’! allowing heredity 
to cause the balance. While Hirsch finds the IQ average decreases 
with the increase of chronological age, he finds that the IQ increases as 
we ascend the grade scale. From this fact he concludes that the “‘slow 
decline of IQ in the age-groups is for the most part due to environmental 
factors. The marked increase of the seventh and eighth grade IQ is 
not due to environmental factors (more and better education) but is a 
function of selection, operating on the basis of innate intelligence.” 
That is, by the time the seventh and eighth grades are reached in the 
mountain schools the first quartile or lower twenty-five per cent of the 
fifteen year or over age group is still in the lower grades, while the 
upper quartile of the thirteen and fourteen age groups is in the seventh 
and eighth grades. 

Another method Hirsch uses in determining the problem of heredity 
and environment as causative factors in the general level of intelligence 
was to correlate IQ’s with chronological ages and then to compare these 
two correlations. Here he found low negative correlations in both 
cases, indicating that there was a slight tendency for a lowering of these 
averages as we go upward in the chronological scale. On this point he 
states, ‘‘nevertheless this indication of an environmental factor in the 
determination of our subject’s average is more than balanced by the 
fact that the relative decrease of intelligence with chronological age is 
greater than the decrease of educational ability with chronological 
age.’’8 

Hirsch concludes that ‘‘close inbreeding, in conjunction with 
selective migration, accounts for the most part for the low general 
intelligence of the East Kentucky Mountaineer. His repressive 
economic and social milieu is a minor causative factor, but is in part an 
effect of the inherent lack of psychic energy and initiative.’’ 

After reviewing Hirsch’s study, it appears that his conclusions as to 
the influences of heredity and environment on the low IQ rating of the 
Kentucky mountain children are rather broad and conclusive from the 


1 See page 220, ibid. 
2 See page 220, ibid. 
3 See page 205, ibid. 
4 See page 239, ibid. 
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data he has produced. It seems he has read too many causative factors 
into the facts of his study, and failed to consider some important 
details in technique which might alter the interpretations of his data. 
For one thing he has assumed that the tests used were so constructed 
as to give an accurate measure of the intelligence of mountain children. 
The nature of their isolated environment gives this group very different 
experiences and associations from the groups in other sections of the coun- 
try upon whom the tests were standardized, and many items of the 
tests which are familiar to the average child are largely foreign to the 
experiences of the mountain child, as we shall attempt to show later in 
this study of Tennessee mountain children. Our data show that the 
Dearborn Intelligence test used by Hirsch somewhat handicaps the 
mountain child in the very nature of some of its items. 

It seems that Hirsch’s study would have been strengthened and 
clarified if the author had explained the date of gathering data and the 
technique of giving and scoring. In a study of this kind when one is 
attempting to explain the influence of heredity and environment, the 
groups studied should have been as near unselected as possible. Hirsch 
mentions several of his groups are selected, but fails to explain upon 
what basis or to take that fact into careful consideration when inter- 
preting his data. The majority of recent investigations have 
emphasized the fact that the younger the children the more desirable 
are the data for a study of innate and acquired characteristics. This 
principle might be applied to the importance of studying the early 
school and preschool age periods of mountain children. Hirsch’s 
study showed that the IQ’s of mountain children decreased with an 
increase in chronological age from ages five and six to fifteen. He 
explains that this fact is due one-third to environment, the balance to 
heredity. Another explanation which he sets forth for low intelligence 
of mountain children is inbreeding, and bases his conclusion on very 
subjective data such as the similarity of names in the different com- 
munities. It seems that he should have made a more objective 
investigation upon a problem of this kind before drawing definite 
conclusions. Another explanation for subnormal intelligence he 
attributes to low psychic energy and initiative which is beyond the 
writer’s ability either to define or explain, because of lack of scientific 
measures. The last cause which we would like to mention as quoted 
by Hirsch is selective migration, which can only be ascertained by 
testing a large number of cases who have migrated and comparing these 
with those individuals who have remained in the mountains. 
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During the fall and winter of 1929-1930 the following experiment 
was made on East Tennessee Mountain children to determine the 
mental status of these people and also to study the effect of schooling 
and environment on the results of the Dearborn and Illinois Intelligence 
tests. 

The writer wishes to acknowledge the valuable aid given by Mrs, 
Viola D. Wheeler in administering, scoring, and other assistance 
rendered in making the investigation. Also to Superintendents K. P. 
Banks and C. A. McCanless of Carter and Unicoi Counties and to the 
public school teachers who have coéperated in the experiment. 

Intelligence tests were given to children in the mountains of East 
Tennessee. All these children lived in the mountains, valleys, and 
under the direct influence of isolated mountain environment. They 
should give a fair picture of the mountain children of East Tennessee. 
All of the cases were selected from public schools which should be 
representative of the different types of schools found in the section and 
range from the poorest to the best, from the one teacher school to 
schools employing five and six teachers. The environment of these 


children is strictly rural. Agriculture is the chief occupation of the | 


parents with a small percentage of lumbering and coal mining. There 
are many phases of the general background and social conditions of 
these children which might be of interest, but the economic and socio- 
logical aspect of mountaineers is not within the scope of this study. 
We shall point out only certain general trends which have a direct 
bearing on an objective study of the intelligence of mountain children. 

The growing educational opportunities in the mountains are 
materially changing the isolated sections. The state is providing 
modern and adequate schools in the very heart of the mountains, and is 
sending well trained teachers, many of whom are holding or working 
toward college degrees, into those schools to teach the mountain 
children. As a whole the teachers are much better trained and doing 
a far better job of teaching than the average person unfamiliar with 
this mountain section would think. Educational opportunities of the 
mountains have advanced with the improvement of roads, thus ena- 
bling consolidation of schools in a number of sections. As this is only a 
recent development, it will be interesting to note the influence of better 
schools on the results of later intelligence test data on the same groups 
of children we are now studying. 

The tests used in this study are: Dearborn IA for Grades I, II, III; 
Dearborn IIC for Grades IV, V, VI, VII, VIII; and the Illinois General 
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Intelligence Scale for Grades III, IV, V, VI, VII, VIII. Both the 
Illinois and Dearborn tests were given to each child from Grades III 
through VIII for the purpose of making a comparison of the two tests 
on children in a mountain environment, then additional cases were 
added to increase reliability of other interpretations. 

One reason for selecting the Dearborn and Illinois tests was to 
make a comparison of the possible influence of language and educa- 
tional factors on the intelligence rating of mountain children. Also one 
test can be used to act as a check on the results of the other. We 
would expect the children in the better schools and in the higher grades 
to make a higher score on the test which is made up of items dependent 
upon schooling. We realize that neither test is entirely free from 
educational influence, but the Illinois seems much more dependent 
upon this factor since a number of items depend upon schooling 
while the Dearborn contains more non-language tests, requiring less 
reading ability. No attempt will be made to discuss the standardiza- 
tion of the tests except where it has a direct bearing on this 
investigation. 

All tests were given by the writer and Mrs. Wheeler, and every 
precaution was taken to get the proper rapport before giving the tests. 
The two tests were alternated so as toeliminate possible errors in testing 
and the fatigue element among the children. The results have been 
carefully scored and checked, in an effort to neglect no part of the 
technique of measurement.' Special efforts were made to make the 
study as scientific as possible in collecting and working up the data for 
this investigation. 


A COMPARISON OF THE ILLINOIS AND DEARBORN TESTS 


It is fairly well agreed that the language element and schooling 
are significant factors which influence the IQ rating of children in 
different environmental areas. For this reason we have attempted to 
compare the results of the Illinois and Dearborn tests on mountain 
children. The Dearborn IA which was used in Grades I, II, III, has 
practically no language requirements except the verbal directions 
given by the tester. It has a number of elements similar to the Binet 
ind vidual test. The Illinois Intelligence Scale is based largely upon 
reading ability and schooling. The Dearborn IIC which includes 





1 Statistical formulae taken from Garrett, H. E.: Statistics in Psychology and 
Education, N. Y., 1926. 
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Grades IV-VIII is based upon performances which require a smaller 
amount of schooling than the Illinois Intelligence Scale. Of course 
there is the language element in certain parts of this test, but of a 
lesser degree than in the Illinois. It would seem that the mountain 
child would be less handicapped on the Dearborn test in so far as 
schooling influences the score. 

The total number of children who were tested with both tests for 
the purpose of comparison is five hundred sixty-four, as shown in 
Table I. These children took both tests on the same day with a breif 
intermission between the tests, and the order of giving the tests was 
reversed in alternate groups to eliminate the problem of fatigue and 
other factors which might influence the results. 


TaBLE I.—CoMPARISON OF IQ’s on ILLINOIS AND DEARBORN TEsts—Eacu CuiLp 
GivEN Botu TEstTs 
Total Group 
























































Illinois tests 
CA No. of cases; Range 4 Pe Fix na 
564 40-145 Median | Q-3 Q-1 QN ‘yy 
80.74 + .62 | 91.78 | 68.38 | 11.72. 
According to CA 
9 27 | 70-115 | 92.5 + 1.88 |101.56 | 85.94 | 7.81 | 
10 63 55-110 | 83.4 + 1.61 | 95.31 | 74.88 | 10.22 | 
11 81 55-135 | 85.23 + 1.60 | 93.65 | 70.63 | 11.51 | 
12 98 45-145 | 81.36 + 1.39 | 92.71 | 70.63 | 11.04 
13 102 50-125 | 80.5 + 1.60 | 91.94 | 66.04 | 12.95 
14 122 45-125 | 75.36 + 1.31 | 87.36 | 64.16 | 11.60 ad 4 
15 64 40-110 | 72.5 + 1.89 | 86.11 | 61.87 | 12.12 \. d 
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Dearborn testa 
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9 27 65-125 ds.31 + 1.18; 99.53 | 89.69 | 4.92 | 2.81 + 2.22 80 
10 63 65-125 | 89.72 + 1.61) 98.45 | 77.96 | 10.25 |6.31 + 2.28 97 
11 81 55-125 | 87.2 + 1.56) 98.75 | 76.35 | 11.20 | 1.97 + 2.23 72 
12 98 55-135 | 81.75 + 1.16) 93.13 | 74.80 | 9.17 | .39 + 1.81 55 
13 | 102 50-120 | 79.38 + 1.20; 90.28 | 70.90 | 9.69 | 1.12 + 2.00 64 
14 122 50-110 | 75.67 + 1.16) 85.68 | 65.16 | 10.21 | .31 + 1.75 54 
15 64 50- 95 |72.5 + .71)| 82.50 | 65.42 8.54 | .00 + 2.46 50 
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Table I shows the results of this comparison. The total group 
contains cases from six years old to eighteen and twenty, but due to 
the small number of cases below nine years and above fifteen those 
ages were eliminated in constructing this table. There is a wider 
variation in the intelligence as measured by the Illinois test as shown 
by the range. The range on the Illinois is twenty points greater than 
that for the Dearborn test. For the whole group, the total spread of 
IQ’s on the Illinois being from forty-five to one hundred forty-five 
while it is only from fifty to one hundred thirty-five on the Dearborn, 
with an average difference in ranges for each age of about ten points. 
The median IQ’s on both the Dearborn and Illinois seem to decrease 
with an increase of chronological age. This would seem to indicate 
that as the child grows older there is a fairly constant fall in intelli- 
gence. This trend will be discussed more in detail when additional 
cases substantiate these findings in later tables. (The decrease in 
intelligence on the Illinois test begins with 92.5 at the age of nine 
years, and falls twenty points to the age of fifteen. This seems to 
show that while the median mountain child is in the normal group of 
intelligence rating at the age of nine, he falls to the low limit of the 
dull group at age fifteen. On the Dearborn test the median IQ at 
age nine is 95.3 and drops to 72.5 at age fifteen, showing the same trend 
as the Illinois test. This is further substantiated by the study of the 
first and third quartiles on both tests. The median intelligence of 
mountain children measures slightly higher on the Dearborn test for 
ages nine, ten, and eleven, but with the other ages the difference 
between the two tests is practically no more than a chance variation 
as shown in Table I. This small difference might indicate that the 
performance test gives a slight advantage over one based on school 
learning for the earlier years, but for the upper levels these mountain 
children seem to do as well on one type ason the other. The difference 
is so small, even for the younger childr t little value can be placed 
on the variation of the two as careful comparison of the 
two tests shows only small or no marked difference in results, we have 
found it safe to add cases to both groups regardless of whether each 
child has two tests or not in order to increase the reliability of the 
study of the intelligence of mountain ne 


“4 
CHRONOLOGICAL AND MENTAL AGES OF MOUNTAIN CHILDREN 


The total number of cases tested by the Illinois test is eight hundred 
forty-five, ranging from the third through the eighth grade. For the 
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Dearborn test we have nine hundred forty-six cases, ranging from 
Grade I through VIII, as shown in Table IV. Table II shows a wide 
range of chronological ages in each grade, from five to thirteen years 
six months in the first grade to twenty-one years in the eighth grade. 

The median mountain child in the first grade is retarded about one 
and a half years, and this retardation increases to two years or more in 
the following grades. The percentage of retardation for the upper 
grades is decreased when the older children who have difficulty in 
making the grades probably drop out to work when compulsory 
education stops at the age of fourteen. Another factor influencing 
retardation is probably due to lack of rigid enforcement of the compul- 
sory education laws. 

The average mountain child in the first grade has a mental age of 
six years three months and a chronological age of seven years five 
months. While the Dearborn mental age shows the children to be 
normal in Grades I, II, III, as far as ability to do standard grade work, 
the chronological ages show them definitely retarded. 

For Grades IV-VIII they are not only retarded chronologically, but 
they are under the standard in mental age for the grade. This means 
these children cannot do successfully the normal work for the grades 
they arein. The Illinois mental age shows a similar trend from Grade 
III through VIII, and is somewhat lower than the Dearborn for Grades 
III through VI, but slightly higher in Grades VII and VIII. 


INTELLIGENCE ACCORDING TO CHRONOLOGICAL AGE 


Table III shows that the median intelligence of the six year old 
children is 94.7, which lacks only five points of being a perfectly normal 
group. 

As we increase the chronological ages we have a consistent decrease 
in intelligence until at the age of sixteen the IQ has fallen about twenty- 
five points. It seems from the Dearborn tests that when the child 
enters school he is only a few points below normality, but as he 
continues on through the different years the median intelligence of the 
mountain child falls into the lower limit of dullness or bordering near 
the line of demarcation between dullness and feeblemindedness. 
These results are substantiated by a study of the first and third 
quartiles which indicate a fairly consistent decrease in IQ. The 
median IQ as measured by the Illinois test for age eight is 85.4 and 
decreases as the children grow older until at age sixteen they fall near 
the borderline of feeblemindedness. It is interesting to note that 
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TasBLeE III.—Comparison or IQ’s AccorDING TO AGE 
| | Chance 
CA |No.ofcases| Range | Median IQ | Q-3 | Q-1 Q | Median diff. | ony 
Illinois Tests 

8 23 40— 95 | 85.41 + 1.32) 89.87 | 79.68 | 5.09 

9 47 40-115 | 86.78 + 1.67| 96.04 | 77.68 | 9.18 

10 90 50-110 | 80.33 + 1.34} 92.18 | 71.83 | 10.17 

11 102 50-135 | 78.33 + .93| 91.39 | 66.83 | 12.28 

12 113 45-145 |77.9 + 1.52| 91.57 | 65.68 | 12.94 

13 120 45-125 |77.5 + 1.27| 89.65 | 62.94 | 13.35 

14 125 45-125 | 74.84 + 1.37| 86.52 | 64.13 | 11.19 

15 63 40—- 95 | 71.07 + 2.10| 86.41 | 59.79 | 13.31 

16 31 45— 85 | 69.38 + 2.64| 82.08 | 58.44 | 11.77 

Dearborn Tests 

6 33 75-115 | 94.68 + 2.03/107.75 | 89.06 | 9.34 

7 62 55-125 |90.9 + 1.38] 98.61 | 81.25 | 8.68 

x 60 40-110 | 88.88 + 1.09] 95.90 | 82.33 | 6.78 |3.47+4 1.72] 92 
9 94 60-145 | 86.38 + 1.79| 95.22 | 79.75 | 7.72 | .404+2.45| 86 
10 99 50-125 | 84.25 + 1.64/ 94.75 | 74.29 | 10.23 |3.92 + 2.12} 89 
11 102 50-130 | 80.00 + 1.49] 94.32 | 70.19 | 12.06 |1.67+1.76| 74 
12 107 50-135 | 81.41 + 1.06| 91.88 | 74.25 8.81 |3.51 + 1.86] 989 
13 109 45-120 | 77.61 + 1.22| 86.97 | 66.56 | 10.21 | .1141.76| 651 
14 125 45-115 | 74.72 + 1.09] 82.80 | 63.39 | 9.71 | .12+1.76| 51 
15 61 59- 95 | 73.44 + 1.39; 82.93 | 65.56 | 8.71 |2.37+ 2.52! 73 
16 29 45- 95 |73.5 + 2.41] 84.81 | 64.06 | 10.37 | 4.12 + 3.57| 78 





























both the Illinois and Dearborn tests show a similar decrease in IQ 
with the Illinois consistently a few points lower than the Dearborn. 
These facts are further substantiated by a study of a fairly consistent 
decrease in the first and third quartiles. 


INTELLIGENCE ACCORDING TO GRADE 


Table IV shows that the mountain children are consistently below 
the normal in intelligence for all grades studied. 

It seems that they make a slightly higher score in the majority of 
grades on the Dearborn test, but while there is a real difference as 
shown here, the differences are not probably great enough to be sig- 
nificant for thisstudy. There is also an indication that the children in 
the earlier grades seem to make a higher score on the Dearborn test, 
and children in the upper grades seem to make a higher score on the 
Illinois. This might indicate the influence of education in the upper 
grades on the Illinois test. 
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TaBLE 1V.—CoMPARISON OF IQ’s AccoRDING TO GRADE AND ToTaL DIstRIBUTION 


- 




































































No. of . , r Chances 
Grade postin Range Median Q-3 Q-1 Q | Median diff. in 100 
Illinois Tests 
III 142 40-120 |70.43 + 1.07) 82.35 | 61.90 | 10.23 
IV 190 45-120 |75.9 + .98 87.61 | 67.90 ; 10.87 
Vv 149 45-115 |74.4 + 1.19| 87.05 | 63.83 | 11.61 
VI 130 40-135 (83.0 + 1.44) 96.53 | 70.16 | 13.19 
VII 145 50-145 |82.02 + 1.71) 90.98 | 73.31 8.79 
VIII 89 45-125 |86.8 + 1.17) 95.43 | 77.75 8.84 
ee 845 40-145 |78.28 + .47| 89.39 | 66.87 | 11.26 
Dearborn Tests 
I 115 45-115 |84.1 + 1.14) 94.42 | 74.91 9.76 
II 87 45-125 |\85.4 + 1.04) 92.66 | 77.08 7.79 
III 103 45-150 |83.96 + 1.27) 93.54 | 72.92 | 10.32 | 10.53 + 1.67 99.7 

IV 172 45-135 |81.5 + 1.18) 94.17 | 69.30 | 12.44 5.60 + 1.54 99 

V 137 50-125 |76.1 + 1.13) 88.75 | 67.58 | 10.59 1.70 + 1.65 75 

VI 117 55-125 |81.9 + 1.17) 93.84 | 73.63 | 10.12 1.10 + 1.57; 68 

VII 128 50-130 |79.5 + 1.00) 91.36 | 73.23 9.07 | 2.52 + 1.98; 80 

VIII 87 55-120 |84.8 + 1.13) 90.14 | 73.25 8.45 | 2.00 + 2.01 74 

am eses 946 45-150 |82.4 + .40| 92.62 | 72.70 9.96 | 4.12 + .62)| 100 





























INTELLIGENCE ACCORDING TO SCHOOLS 


Table V shows that the data were collected from a number of 
schools which are a fairly representative group of the different types 
of public mountain schools. 

The IQ’s of mountain children according to the Dearborn tests seem 
to increase in schools of larger enrollment and greater number of teachers. 
There was no marked increase in IQ’ sin the larger schools as measured 
by the Illinois test. . The Mountain children seem to make a higher 
score in the majorityof schools on the Dearborn test than on the Illinois. 
One would naturally expect the reverse of this situation, assuming that 
the advantages of the larger schools would help on the educational 
requirements of the Illinois test. However, this might be explained by~ 
the very recent consolidation movement in this section. 


INTELLIGENCE DISCUSSED IN PERCENTAGE OF OVERLAPPING 


One can get a fairly good picture of a cross section of the mountain 


children by considering the percentage of overlapping as shown in 
Table VI. 
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,a 
ay A 6 80 45-120 | 73.89 + 7.28| 88.5 62.5 13.0 
an B 5 22 60-125 | 86.25+2.35| 94.13 76.5 8.82 
to Cc 5 69 40-145 | 77.5 +1.31| 87.86 70.48 8.69 
Rs D 5 70 55-125 | 84.55 + 1.34| 94.02 76.10 8.96 
hs E 4 83 45-135 | 85.21 + 1.67| 93.00 68.66 12.17 a 
F 4 56 55-105 | 82.00 + 1.11] 88.64 75.00 6.67 
G 4 41 45-105 | 76.07 +1.79| 84.75 66.42 9.17 
H 3 28 45-110 | 81.66 + 1.32| 89.16 68.00 5.58 
I 3 43 55-110 | 87.95 +1.76| 97.25 78.75 9.25 
J 3 50 45-120 | 72.50+1.78| 83.93 63.88 10.05 
K 3 34 45-110 | 70.00 + 2.53 | 84.38 60.83 11.78 
L 3 40 50-135 | 79.28 +1.76| 90.00 72.14 8.93 
M 2 35 50-115 | 78.50 + 2.34| 89.06 66 .87 11.09 
N 2 42 45-110 | 70.62 + 2.07| 81.88 60.41 10.74 
O 2 26 60-110 | 85.00 + 2.61 | 94.38 73.15 10.62 
P 2 18 40-120 | 72.50 + 4.24| 86.25 57.5 14.38 
Q 2 25 50- 70 | 63.5 +1.23| 67.97 58.03 4.92 Av 
R 2 21 50-105 | 75.62 + 1.63 | 82.92 71.04 5.99 ae 
x 1 19 55-105 | 76.25+ 4.21; 93.13 63.75 14.69 
Y 1 23 45-110 | 75.83 + 4.05| 91.25 61.27 14.98 
Zz 1 16 55-105 | 77.00 + 2.61 | 80.00 63.33 8.34 
I¢ 
T 
Dearborn Tests 
te 
ne 
A Yn 162 45-120 | 82.58 + .82| 90.11 73.25 8.43 14 
B 5 64 50-125 | 84.55 +1.58| 97.0 76.67 10.17 
Cc 5 69 45-130 | 80.75 + 1.20| 89.98 | 74.06 7.96 81 
D 3 70 55-135 | 88.00+ 1.46| 96.12 76.62 9.76 a 
E 4 88 50-115 | 84.58 + 1.71 | 93.33 68.00 12.67 ; 
F 4 59 45-150 | 87.23 +1.33| 93.63 | 77.34 8.15 18 
G 4 37 50-125 | 80.41 + 2.60} 93.44 68.13 12.66 Dp 
H 3 50 60-115 | 83.00 + 1.56 | "94.38 76.75 8.82 
I 3 44 65-125 | 88.75 + 1.75 | 100.00 81.43 9.29 
J 3 65 50-115 | 76.07 + 1.97| 90.63 65.25 12.69 
K 3 32° 50-125 | 75.00 + 2.07 85.00 66.25 9.38 
L 3 18 60-120 | 85.00 + 3.22| 96.84 74.16 10.84 
M 2 35 60-125 | 85.83 + 2.20| 95.25 74.79 10.23 
N 2 34 50-115 | 75.71 + 1.67| 79.65 64.06 7.79 r 
O 2 26 60-115 | 85.00 + 2.68 | 100.62 78.75 10.94 
P 2 19 50-125 | 78.5 + 1.63] 83.25 71.88 5.69 fc 
Q 2 18 50- 85 | 65.0 + 2.05] 72.5 58.5 7.00 t 
R 2 8 45-95 | 87.0 +6.82| 93.34 | 62.5 15.42 ( 
x 1 13 60-100 | 82:5 +3.58; 96.25 75.62 10.32 g) 
Y 1 20 45-105 | 75.00 + 4.66 | 95.00 61.66 16.66 
Z 1 16 60-105 | 78.33 + 3.58] 91.67 | 68.75 | 11.46 . 
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TaBLE VI.—PERCENTAGE OF MOUNTAIN CHILDREN ABOVE AND BELOW NORMAL IN 
IQ AccorpIne To CA’s AND GRADE 


























Illinois test Dearborn test Illinois test Dearborn test 
Normal Normal Normal Normal 
CA's 
Above, | Below, | Above, | Below, came Above, | Below, | Above, | Below, 
per per per per per per per per 
cent cent cent cent cent cent cent cent 
6 ee ewe 45 55 I wea 16 84 
7 ee Bere 21 79 II a 11 89 
s 0 100 10 90 III 4 96 10 90 | 
9 15 85 14 86 IV ~ 92 *| 45 85 | 
10 17 83 14 86 Vv s 92 8 92 
11 15 85 18 82 VI 11 89 15 85 
12 12 88 12 88 viI | 12 88 9 91 Ny 
13 13 87 4 96 VIII 17 83 6 94 
14 hg | ‘ 93 3 97 
15 0 100 0 100 
Average per cent....| 11.3 88.7 14.1 85.9 10.0 90.0 12.0 88.0 
| 























It is interesting to note the small percentage of children whose 
IQ’s are above one hundred on both the Illinois and Dearborn tests. 
The mountain children show a higher percentage on the Dearborn 
test than on the Illinois. The average percentage of children above 
normal on the Illinois test is 11.3 per cent, and on the Dearborn 
14.1 per cent. The general average per cent above the normal in each 
grade shows the same general trend as indicated for the different 
ages. For the Illinois test the average percentage above the normal 
is ten and for the Dearborn twelve. This further substantiates our 
previous findings of the two intelligence tests. 


INTELLIGENCE OF THE ENTIRE GROUP 


A picture of the entire group of mountain children is found in Table 
IV and in Graph I. Here the range of intelligence for one thousand 
forty-seven mountain children runs from forty IQ on the Illinois test 
to one hundred fifty on the Dearborn. About one-half of this entire 
group have two tests on each child, thus giving a check on the results 
as shown in Table I. The median IQ for the whole group is 78.28 


+ .47 on the Illinois and 82.4 + .40 on the Dearborn test. 
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This places the group as a whole within the classification of dul] 


‘children in respect to intelligence as measured by these two scales, 


The Dearborn test seems to give a slight advantage to this group over 
the Illinois as shown by the four points difference between the median 
IQ’s. As the chances indicate this to be a real difference, it is interest- 
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Graph I.—Distribution of IQ’s of mountain children on the Dearborn and Illinois 
Intelligence Tests. 


ing to note that the Dearborn tests consistently give the group 4 
higher score, but this difference is so small that it does not materially 
affect the classification of intelligence so as to.place these children in the 
normal group of vice versa. The quartile distribution of both tests 
shows the group to follow the normal curve fairly close, thus indicating 
an unselected group. 


Av 


ch 
ar 
ci 








ull 
les, 
ver 
lan 
St- 


nois 


D & 


the 
sts 





Intelligence of Mountain Children 365 


MouNTAIN CHILDREN COMPARED WITH RuRAL, City, AND GENERAL 
POPULATION ON THE ILLINOIS TEST 


By examining Table VII we can compare the intelligence of the 
mountain children as measured by the Illinois test with rural and city 
children, and with the general population. 


TasLe VII.—CoMmPpaRISON OF THE MEDIAN IQ’s anp MENTAL AGE oF MounrtTaAIN 
CHILDREN, Rurau, aNpD Crry, BAsEp on ILLINo1Is TzstT! 


























oe Rural schools | City schools Bion oo 
Grade 
| IQ MA IQ | MA | IQ | MA IQ MA 
III | 73 7-4 | 90 7-10 | 89 7-11 | 90 7-10 
IV 76 8-2 | 96 9-4 95 | 9-5 95 9-4 
vo 74 9-1 | 98 10-6 99 10-8 98 10-8 
VI 83 10-3 | 100 11-9 | 101 12-0 | 101 11-11 
VII 82 11-7 | 101 13-1 101 13-1 101 13-1 
VIII 87 12-7 | 106 14-3 | 104 14-4 | 104 14-3 
Average..... 79.1 een 98.5 preees 98.1 Baae ss 7 98.1 

















1 Buckingmam, B. R.: ‘‘ Manual of Directions of Illinois Intelligence Scale.’’ 
P. 16. 


It is interesting to note that the median mental age of the mountain 
children in Grade III is only six months below that for rural children 
and the general population, and only seven months lower than that of 
city children. This difference increases consistently up to one year 
and eight months for Grade VIII, between intelligence of mountain, 
city, rural, and general population, and one year nine months between 
mountain and city children. The average median IQ for all grades is 
79.1, or nineteen points lower than the average median for the other 
groups of children. 

It is interesting to note that rural children score practically the 
same on the Illinois tests as city children from Grade III through 
VIII. This is contrary to most of the findings of intelligence test 
data on rural and city children. The intelligence of mountain chil- 
dren is consistently below the rural and city children on the Illinois 
tests. 


¢ 

4 
’ 2 | 
| 

a F 

é 

t 

¢ 


GEE 


366 


The Journal of Educational Psychology 


TENNESSEE MOUNTAIN CHILDREN COMPARED WITH KENTUCKy 
MountTAIN CHILDREN 


Table VIII gives the comparison of East Tennessee mountain 
children with East Kentucky mountain children from the results of 
the Dearborn tests. 


TasBLeE VIII.—Comparison oF THE IQ’s or THE East KENTUCKY AND East 
TENNESSEE MOUNTAIN CHILDREN ON THE DEARBORN INTELLIGENCE TEsTs! 









































Tennessee Kentucky Tennessee a | Kentucky 
Grade 
No. of | Med. | No. of | A.V. CA No. of | Med. | No. of | A.V. 
cases IQ cases IQ cases IQ cases IQ 
l | 
I 115 84.1 14 | 102.1* 5&6 33 94.7 88 | 86.6* 
II 87 85.4 25 90.5 7 62 90.9 113 85.1* 
III 103 83.9 23 89.9 8 60 88.9 180 81.0 
IV 172 81.5 28 85.2 9 94 86.4 179 | 79.2 
Vv 137 76.4 29 84.0, 10 99 84.3 | 190 78.6 
VI 117 81.9 35 86.2 11 102 80.0 191 77.2 
VII 128 79.5 48 91.1 12 107 i 4 211 75.4 
VIII 87 84.8 45 87.3 13 109 77.6 177, |) «(73.1 
14 125 74.7| 174 | 74.6 
15&16 90 | 73.4) 442*| 81.1 
Average......... 946 82.2 247 89.5 | eae 881 83.2 | 1945 | 79.0 
| 














* Very selected group. 


The figures for the Kentucky children are taken from Hirsch’s 
study of Kentucky mountaineers.! The main point of interest here is 
the fact that both groups of mountain children fall within the dull 
group according to intelligence rating, even though the Kentucky 
group shows a slightly higher IQ in most cases according to grade. We 
realize that there may be a discrepancy in comparing the median with 
the average, and care should be taken of that fact when reading the 
table. However it does indicate a general trend of both groups as 
measured by the median and average on the Dearborn test. The 
Kentucky mountain children seem to be higher in intelligence accord- 
ing to grade. Neither the Kentucky or Tennessee mountain children 
show a significant increase or decrease in IQ’s with an increase in 
grade. Both groups show a fairly consistent decrease in IQ with 
increase in age, with the exception of age sixteen in the Kentucky 





1 Hirsch, N. D. M.: ‘“‘An Experimental Study of the East Kentucky Moun- 
taineers.”” Clark University Press, pp. 200-214. 
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group. This should not materially affect the general trend because 
the Dearborn IQ is figured on a maximum CA of fourteen and a half 
years.! Both groups show that the younger the child the higher the 
IQ as measured by the Dearborn test. 

These studies were made by different individuals in different 
mountain sections with the same test with an interval of several years 
between them. Since the East Tennessee mountain children show a 
higher median intelligence than the Kentucky children, according to 
CA, several years previous, this might indicate the influence that better 
educational facilities have on the intelligence of mountain children, as 
measured by intelligence tests. 

We believe that we have a sufficient number of cases to indicate 
the general trend of the intelligence of mountain children. Our 
data indicate that there is a difference in the scores on the Dearborn 
and Illinois intelligence tests. These differences are not large, 
although they seem to show that mountain children have a slight 
advantage on the Dearborn test where a smaller amount of schooling is 
required. Both tests clearly place the median mountain child below 
the normal in intelligence. 

Table III shows that the intelligence of mountain children at age 
six or at the beginning of school is about 95 or well within the normal 
group, according to the Dearborn test. But with an increase in 


‘chronological age we note a decrease in intelligence of about twenty- 


five points before reaching age sixteen. It seems plausable to specu- 
late that if the children had been measured several years earlier the 
difference at age six might have disappeared, thus placing this group 
closer to a perfectly normal group. That is, if the intelligence rating 
of these children decreased as rapidly from age six to sixteen, what did 
it do before age six? It may have decreased steadily from birth 
through the preschool period according to whatever degree poor 
environment affected its development. This same general trend of a 
decrease in IQ with an increase of chronological age is shown by the 
Illinois test. It seems safe to say that a similar speculation for the 
lower ages might be made from the results of the Illinois test. 

The influence of environment on the results of intelligence tests 
has been shown by studies on Indians, Negroes, and the Canal boat 
children. Jamerson and Sandford? found that environmental influ- 





1 Dearborn, W. F.: ‘‘ Manual of Directions for Series II.”” Pp. 13-16, 
*Jamerson and Sandford: Mental Capacity of Southern Ontario Indians. 
Journal of Educational Psychology, Nov. 1928, p. 536. 
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ences affected the IQ’s of Indian children. This is also indicated by 
the results of intelligence tests on College Freshmen in East Tennessee, ! 
where the median intelligence score is lower than the average college 
freshmen in other parts of the country. 

The mountain children are below the standards for rural and city 
children according to the results of Illinois tests. This might indicate 
the influence of environment, in that, tests were standardized in better 
educational systems than are found in schools of the mountains. In 
comparing the intelligence of the Tennessee mountain children with the 
Kentucky mountain children the same general trend of subnormality 
is found, with a decrease in the median and average 1Q’s as measured 
on the same test. Dr. Hirsch explained this decrease as due mostly 
to heredity, about one-third to environment and two-thirds to heredity. 
This is rather speculative evidence of hereditary influence in view of the 
data he presents. It may be true that the more intelligent leave the 
mountains and seek a better environment, but this does not give a 
satisfactory explanation. 

We would like to make an analysis of the first grade test result of the 
mountain children and compare their performance with an unselected 
group of children in Johnson City,? Table IX. 
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The city group is a normal group with a median IQ of ninety-nine. 
The analysis of the different test items shows that the mountain child 
in the first grade misses a higher percentage of all items on the test, 
with one exception. The greatest difference between the two groups 
of children is in item No. 13 which involved the recognition of money. 
Eighty-seven per cent of all the mountain children missed this test and 





1 Wheeler, L. R.: ‘‘Unpublished data on College Freshmen in the East Ten- 
nessee State Teachers College.”’ Johnson City, Tenn. 

2 Norms of items from the process of standardizing the Dearborn test were 
not available for this study. 
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thirty-seven per cent of the city children. The second highest was 
No. 17 which seventy-seven per cent of the mountain children missed, 
and forty-seven per cent of the city children. This item was a time 
test and the writer has observed that the mountain children are in 
the habit of working very slowly, similar to most rural children. Of 
the seventeen different items the seven missed more frequently by 
mountain children are largely dependent on environment such as the 
recognition of stamps, telling the time of day, dividing marbles. It 
is clear that the average mountain child does not have very much 
money to handle and does not often come in contact with stamps on 
account of the small amount of mail coming and going in the home. 
The fourth highest missed was the ball field test which is not a learning 
test. 

With this one exception you can say that the higher percentage 
of tests missed by the mountain children are subject to schooling and 
general home environment, and this percentage decreases with a 
decrease in the learning facilities dependent on the child’s environment. 
It seems that learning materially influences the tests for the first grade, 
or when the child first enters school. The difference in intelligence 
between the mountain children and a normal group for the first grade 
is fairly small, especially for age six, as shown in Table III, and the 
difference in the items might account for part if not all of the difference. 
This would decrease the difference in the intelligence of the mountain 
children and other normal groups. It may be that the tests are stand- 
ardized on the basis of children who are not materially handicapped 
by an isolated environment. We are not able to say how much differ- 
ence is due to heredity or environment, but, our data indicate that 
some is certainly due to environment. 


SUMMARY AND CONCLUSIONS 


1. Practically all studies of the Southern Highlander have been 
based upon subjective data, and treated from a sociological and 
economic point of view. 

2. The most outstanding objective investigation of the intelligence 
of these people was made by Hirsch on the East Kentucky Mountaineer. 
He found that the intelligence of mountain children is below the 
normal, and concludes that a large part of this deficiency is due 
to heredity. 

3. The data for this investigation are based on one thousand 
one hundred forty-seven cases of mountain children in public schools, 
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fairly representative of isolated mountain sections. Half of these 
children were given two tests, the Illinois and Dearborn for the 
purpose of comparing and checking results. 

4. In comparing the two tests there is found no marked difference 
in the intelligence of mountain children as measured by the Illinois 
and Dearborn tests, although a slight advantage is given by the 
Dearborn test especially for the lower ages. There is no marked 
difference in IQ between the Dearborn and Illinois tests according 
to grade, but a greater difference in favor of the Dearborn test accord- 
ing to chronological age and school. Both of these tests seem to show 
the same general trends for the entire group of mountain children. 

5. The Dearborn and Illinois tests show that the median IQ’s 
for mountain children are seventy-eight and eighty-two with the 
higher IQ in favor of the Dearborn test. 

6. The median IQ of mountain children seems to be near normal 
at age six but shows a fairly consistent decrease in intelligence with an 
increase in chronological age. 

7. There is a marked retardation among mountain children: 
approximately one and a half years in the first grade to two years in the 
eighth grade. 

8. The standards for both rural and city children are higher than 
the medians for the mountain children, as measured by the Illinois 
test. 

9. The analysis of the Dearborn IA tests for Grade I shows that 
as a general rule a larger percentage of mountain children miss the 
test items which were based on some phase of environmental influence, 
in comparison with a normal unselected city group of the same grade. 

10. The general trend of this investigation indicates that the results 
of both tests are materially affected by environmental factors, and 
that the mountain children are not as far below the normal as the 
tests seem to indicate. With the proper environmental changes the 
mountain children might test near a normal group. 
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THE VALUE OF OBJECTIVE TESTS AS TEACHING 
DEVICES IN ae eC PSYCHOLOGY 
LAS 


O. E. HERTZBERG 


New York State Teachers College at Buffalo 
AND 


J. D. HEILMAN AND H. W. LEUENBERGER 


Colorado State Teachers College at Greeley 


Study after study has been made to aid the elementary and 
secondary school teacher in the improvement of his teaching, but 
comparatively few carefully controlled experiments have been con- 
ducted for the purpose of improving college instruction. The criticism 
has often been made that the poorest instruction in our entire edu- 
cational system, at least as far as method is concerned, occurs in our 
colleges and universities. This may be due, partly, to the fact that 
until very recently a man’s knowledge of a subject was a sufficient 
voucher for his ability to teach it. There has been a gradual awaken- 
ing, however, to the fact that knowledge of subject-matter and teach- 
ing ability are not synonymous, and to the great need of evaluating 
and improving instruction on the higher levels. Recent studies have 
been made to determine the relative value of lecture, laboratory and 
discussion methods of teaching college classes, the value of intelligence 
tests in prognosis, the reliability of various kinds of examinations, the 
effect of pre-testing, and the effect of supplying the students with 
syllabi and questions on the text.'? The following investigation into 
the value of using carefully constructed objective tests as teaching 
devices in classes in educational psychology falls in line with this 
general trend of interest in improving college teaching, and is, so far 
as the writers have been able to discover, the first investigation into 
the efficacy of this particular method. 

In order to determine the value of objective tests as teaching 
devices, it was necessary to secure two groups, an experimental group 





1 Jersild, A. T.: Examination as an Aid to Learning. Journal of Educational 
Psychology, November, 1929. 
? Robinson, L. J.: ‘The Value to College Students of Lists of Questions on a 


Text.” Unpublished Master of Arts Thesis, Colorado State Teachers College, 
Greeley, Colorado, 1926. 
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which was to have the use of objective tests as study aids throughout 
the course, and a control group to which study tests were not to be 
given. These groups were secured in the Educational Psychology 
Department of Colorado State Teachers College at Greeley from mem- 
bers of the sophomore class. The control group in each part of the 
experiment consisted of students in educational psychology during the 
Winter quarter, 1929-1930. The experimental group was made up 
of students studying the same course during the Spring quarter, 1930. 
Both groups studied the same subject-matter, spent the same time 
on each unit of the course, and were taught by the same instructor 
(Hertzberg). The only difference in the procedure of the two groups 
lay in the distribution of study tests to the experimental group. 

The groups were equated on the basis of percentile scores made 
on the Thurstone Psychological Examination, an intelligence test 
used at Colorado State Teachers College for classification purposes. 
It was considered unnecessary to equate the groups on the basis of 
previous training in educational psychology. Unpublished studies 
made by the psychology instructors at Colorado State Teachers 
College have indicated that such previous training would not affect 
the scores in this particular course. 

The subject-matter of the course in educational psychology had 
been divided into psychologically organized units by the department 
instructors. One of the writers (Leuenberger) constructed new-type 
objective tests for each unit of the course, the nature of the material 
determining the type of test—true-false, multiple-choice, completion— 
that was constructed. 

In the experimental group the following procedure was carried 
out for each unit’s work. The first day was given over to the assign- 
ment of the unit’s work; the second day was spent in study and recita- 
tion; and on the third day the study tests were administered in mimeo- 
graphed form, the students taking the tests without reference to their 
texts or notes. The tests were then scored, any wrong responses 
being checked, and given back to the students with the suggestions 
that they correct their errors by reference to their texts or notes, 
and that they ask the instructor any questions they desired con- 
cerning items in the test. The students were allowed to keep the 
tests until the day of the instructor’s regular examination at the end 
of the unit’s work. In this way the study tests were used by the 


students in any way they desired. No objective tests were given as 
aids to the final examination. 
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Examinations constructed by the instructor were used throughout 
the study as a basis of comparison between the control and the experi- 
mental groups. These examinations were used to measure the 
achievement of all students in the educational psychology course 
used in this study and were not seen by the experimenter (Leuenberger) 
before or during the time when he was constructing the study aids. 
However, at the conclusion of the experiment, a comparison of the 
study tests and the instructor’s examinations showed that in no case 
were they identical in form. 

At the conclusion of the study the whole experiment fell logically 
into the following three divisions: (1) The preliminary experiment 
which compared results obtained on the first unit’s work, completed 
by forty-nine students in each group; (2) the second experiment which 
compared results obtained on the second, third, fourth, fifth and sixth 
unit’s work, completed by eighty-six students in each group; and (3) 
a comparison of the final examination results of the two groups of 
students studied in the second experiment. 

The comparison of the results of each part of the study was made 
on the basis of the averages, the standard error of the averages, the 
standard deviations and their standard errors, the coefficients of 
variation, and finally, and most significantly, the standard error of 
the differences between the averages of the control and experimental 
groups. In addition to the standard error of the differences of the 
averages, the experimental coefficient of each part of the study was 
computed in order to determine the reliability of the differences of the 
averages. ' 


THE PRELIMINARY EXPERIMENT 


For the first experiment, forty-nine Spring quarter students were 
paired with forty-nine Winter quarter students. The results as 
determined by the instructor’s examination on the first unit’s work are 
given in Table I. This table should be read in the following manner. 





1 There are two formulae for finding the standard error of the differences; the 
long formula 





V/o7(av.1) + 97(av.2) — 270(a0.1)F(av.2) 
and the short formula 





V o*(av.1) + o*(av.2) 


In reporting the results of this study, the short formula has been used, as 
there was no appreciable difference in the results of the study when both formulae 
were used—i.e., in the second and third parts of the study. 
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The average score made by students in the control group was 66.28 
with a standard error of 2.25. The average score of the experimental 
group was 76.27 with a standard error of 1.96. The standard deviation 
was 15.73 for the control group and 13.68 for the experimental group. 
The coefficient of variability was 23.73 for the control group and 17.94 
for the experimental group. The differences of the obtained averages 
was 10.10 points with a standard error of 2.98. The experimental 
coefficient was found to be 1.22. According to McCall! if the experi- 
mental coefficient is 1.00 the difference of the obtained averages can 
be considered as completely reliable. 

In summary, Table I shows that the experimental group, employing 
objective tests as study aids, is superior in achievement to the control 
group which did not have this help. The experimental coefficient 
being 1.22 times the size it would need to be to show absolute reliability 
of the difference of the obtained averages is evidence that, in this 
experiment, objective tests as study aids did increase the achievement 
of the students involved. Furthermore, the standard deviation and 
coefficient of variation being smaller for the experimental than for the 
control group, shows that the use of objective tests as aids to study 
tended to make the group using them more homogeneous in achieve- 
ment than the group not using them. In this case the relative vari- 
ability of the experimental group was seventy-six per cent of the 
relative variability of the control group. 


THE SECOND EXPERIMENT 


For the second experiment, eighty-six Spring quarter students, 
composing the experimental group, were paired with a like number 
of Winter quarter students, composing the control group. This 
part of the study covered five units of the course, as against one unit 
covered in the preliminary experiment. An examination of Table II 
shows an obtained average of 153.83 for the control group with a 
standard error of 2.82, and an obtained average of 171.98 for the 
experimental group with a standard error of 2.53. This gives a differ- 
ence in the obtained averages of 18.15 points in favor of the experi- 
mental group. As in the first experiment, the standard error of the 
average, the standard deviation, and the coefficient of variation are 
lower for the experimental group than for the control group. The 
standard error of the difference of the obtained averages (18.15) is 





1 McCall, W. A.: ‘“‘How to Experiment in Education.” The MacMillan 
Company, New York, 1923. 








ee wv’ 


we Fr re awe Se 


an Oo Oo O&O 


= 





Tests as Teaching Devices 375 


3.78. The experimental coefficient is 1.72 times as large as it would 
need to be for complete reliability. 

The above figures show a completely reliable difference in favor 
of the experimental group. As in the first experiment, the experi- 
mental group, with a showing here of 80.15 per cent as much relative 
variability as the control group, is the more homogeneous. 


A COMPARISON OF THE Two GROUPS ON THE BASIS OF THE 
INSTRUCTOR’s FINAL EXAMINATION 


For this part of the study the same groups that took part in the 
second experiment were used. The subject-matter consisted of the 
instructor’s final examination, made up of a comprehensive sampling 
of all the work of the course in educational psychology. While the 
experimental group did not have the use of objective tests for the 
purpose of reviewing for the final examination, the members of that 
group had used the tests as aids in studying the material throughout 
the course. 

An examination of Table III shows the control group to have an 
average of 109.48 with a standard error of 1.58 points. The experi- 
mental group’s average is 107.38 with a standard error of 1.30. The 
standard deviations for the groups are 14.69 for the control and 12.08 
for the experimental. The coefficient of variation for the control 
group is 13.42 and for the experimental group 11.25. The difference 
of the averages on the final examination is 2.09 points in favor of the 
control group. This difference has a standard error of 2.05. The 
experimental coefficient is .37 which is only .37 of the amount neces- 
sary (1.00) for complete reliability. The experimental group again 
possesses slightly more relative homogeneity than the control group, 
being 83.82 as variable. This may be due to the fact that this group 
used the study tests throughout the quarter. While the above figures 
must be interpreted with caution, it can safely be said that, as far 
as this study is concerned, the control group tends to equal or to a 
small extent exceed the experimental group when neither group used 
objective tests as aids to studying for the final examination. This 
study would indicate that objective tests used as aids to general study 
throughout a course have little value for permanency of retention. 


GENERAL SUMMARY AND CONCLUSIONS 


The above investigation was made to determine the value of 
objective tests as teaching devices in classes in educational psychology 
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at Colorado State Teachers College at Greeley. In the first two parts 
of the study, the groups using the objective tests as study aids were 
found to be superior, in achievement, to the control groups, the differ- 
ences, in each case, being reliable ones. Furthermore, throughout the 
study, the groups using the tests were more homogeneous than the 
groups not using them. As it is an accepted principle in education 
that homogeneous groups are easier to teach than heterogeneous 
ones, the very homogeneity created through the use of these tests 
should make them well worth while as teaching devices. 


TaBLeE I.—Scores MaprE By CoNTROL AND EXPERIMENTAL GROUPS ON 
INSTRUCTOR’S EXAMINATION IN THE First Part OF AN EXPERIMENT 
TO DETERMINE THE VALUE OF OBJECTIVE TESTS AS TEACHING 
Devices In EpvucaTIONAL PsycHoLoGy CiassEs, COLORADO 
State TEACHERS COLLEGE AT GREELEY, 1929-1930 
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The last part of the study, in which the results of the groups on 
the instructor’s final examination were compared, tends to show that, 
as far as the procedure employed in this study is concerned, objective 
tests do not aid the students in achievement that requires delayed 
recall. It is possible that the students of the experimental group 
were so accustomed to the objective tests as study helps that they 
were unable to review without them. Had the objective tests been 
given back to the students to serve as a basis for review, the results on 
the final examination might have been different. 


TaBLE II.—Scores MapeE spy ConTROL AND EXPERIMENTAL GROUPS ON 
INsTRUCTOR’S EXAMINATIONS ON Five Units’ WorkK IN AN EXPERIMENT 
TO DETERMINE THE VALUE OF OBJECTIVE TESTS aS TEACHING DEVICES 
IN EpvucaTIONAL PsycHoLoGy CiassEs, CoLorapo State TEACHERS 
CoLLEGE AT GREELEY, 1929-1930 
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) This study should be continued in educational psychology classes a 
ij throughout several quarters in order to check on the results given r 
1; here. Especially would it be valuable to discover what differences, 7 
if any, might be obtained if the objective tests used throughout the i 
quarter were returned to the experimental group for purposes of t 
review just before the final examination. Another valuable study 
could be made in which the instructor would review the study tests 
—T TaBLE III.—Comparison or Scores MaApE By CONTROL AND EXPERIMENTAL 
Pas. GROUPS ON THE INSTRUCTOR’S FINAL EXAMINATION IN AN EXPERIMENT TO 


DETERMINE THE VALUE OF OBJECTIVE TESTS AS TEACHING DEVICES IN 
EDUCATIONAL PsycHoLoGy C.iassEs, CoLorapo STaTE TEACHERS 
CoLLEGE AT GREELEY, 1929-1930 
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a day or two after they had been handed back to the students to 
re-check on the errors which had been made and previously checked. 
This procedure might tend to further the achievement of all students, 


instead of only those who were seriously interested in employing the 
tests as study aids. 
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SEX DIFFERENCES IN MATHEMATICAL 
ACHIEVEMENT OF JUNIOR COLLEGE STUDENTS 


WALTER CROSBY EELLS 


Professor of Education, Stanford University 


AND 


CLEMENT S. FOX 


Superintendent of Schools, Gilbert, Arizona 


Numerous measurements and studies have been made of differences 
in mental ability and academic achievement for the two sexes in the 
elementary school; some have been made in the high school; but few, 
if any, have been made at the junior college level. Many of the studies 
reported have been based upon too small a number of cases to make 
it possible to equalize divergent factors and to obtain results which 
were statistically significant. It is the object of this paper to report 
the result of a study of differences in mathematical achievement of 
the two sexes for over 6000 students in California junior colleges. 


Data AVAILABLE 


During the college year, 1929-1930, in the California Junior College 
Mental-educational Survey, over 11,000 students in forty-seven junior 
colleges were given two standard tests—the Thurstone or American 
Council on Education Psychological Examination for High School 
Graduates and College Freshmen, 1928 edition, and the Iowa High 
School Content Examination, Form B. Practically all students were 
tested under approximately uniform conditions within a month of 
the opening of the college year. All papers were sent to Stanford 
University to be scored and summarized. In addition, a personal 
data card was secured for each student, giving in addition to many 
other facts, sex, age, class in junior college, and number of high school 
units completed in mathematics.! This study is restricted to an 
analysis of the work of low freshmen only, on the mathematics section 
of the Iowa test. Complete test data and also complete high school 





1 For more complete details of the survey and a general analysis of the results, 
see Eells, Walter Crosby: California Junior College Mental-educational Survey. 
Bulletin No. J-3, California State Department of Education, Sacramento, Cali- 
fornia, 1930, pp. 61; and an article by the same author with the same title in 
Educational Record, Vol. XI, October, 1930, pp. 281-91. 
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mathematics records were available for 3323 men and 2720 women 
of the entering freshman class. 


COMPARISON OF GROss SCORES 


A comparison of total scores on the mathematics test is shown in 
Table I. The maximum possible score was seventy-five. 


TABLE I.—CoMPaARISONS OF Gross ScorES FOR ENTIRE Group or Low FRESHMEN 
oN Marnematics TEstT 











| Mean Sigma | PE of mean | Maximum 
| 
Ree ee 34.74 14.87 | 0.17 72 
icc ts ca Bo ene’ 25.20 12.29 | 0.15 | 71 
ss 0:0 Ge nea wah — 9.54 + 0.23 





The difference is highly significant, being over forty times its 
probable error. That the two sexes were approximately equal in 
general mental ability is shown by the fact that the average mean 
scores on the Thurstone Psychological Examination were 138.0 + 0.6 
for the men and 136.8 + 0.6 for the women. The variability of the 
men seems to be greater than that of the women, but when the differ- 
ences in means is considered, the relative variability of the women is 
seen to be considerably greater. The Pearson coefficient of variability, 
V, for the men is forty-three, for the women, forty-nine, showing the 
men to be eighty-eight per cent as variable as the women. 


EQUALIZATION OF PREPARATION 


The average amount of high school mathematics taken by the 
men was 2.4 units; by the women, 1.8 units, only three-quarters as 


TABLE II.—ComPparRISON OF MATHEMATICS ScorES OF Low FRESHMEN ACCORDING 
To NuMBER OF Units or HigH ScHooL MATHEMATICS TAKEN 





























Men | Women | Difference 
Units 
Number; Mean; PE | Number Mean | PE | Amount PE 
| | | | | 
0 96 17.00; 0.90 106 | 8.92, 0.44 8.08 | 1.00 
Y-1 274 | 20.20) 0.42 371 17.22) 0.27 2.98 | 0.50 
14-2 1254 | 28.48! 0.20 1679 25.00) 0.14 3.48 | 0.24 
216-3 901 38.50) 0.27 433 34.36) 0.40 4.14 | 0.48 
3% and over... 798 48.70) 0.29 131 47.32; 0.83 1.38 | 0.88 
3323 2720 
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much. It might be supposed, therefore, that the differences shown 
in Table I could easily be accounted for by this factor alone. To 
investigate this supposition the students were grouped according to 
number of units in mathematics secured. Results are summarized 
in Table II, and shown graphically in Fig. 1. 

Thus when the factor of high school preparation is equalized it is 
found there is still a uniform superiority in mathematics on the part 
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Fic. 1.—Comparison of mathematics scores of low freshmen according to number 
of units of high school mathematics taken. (N = 6043) 
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of the men. The two lines of Fig. 1 are nearly straight and parallel. 
Increase in mathematical knowledge when entering the junior college 
is directly proportional to amount of high school preparation; differ- 
ences between the sexes are practically uniform but are in favor of 
the men at all levels of preparation, and significantly so in all groups 
except those having 314 or more units to their credit. Except at 
the extremes, the differences between men and women for equated 
groups are practically the same within the limits of the probable 


error. 
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EQUALIZATION OF AGE 


The mean age of the men was 19.1 years; of the women, 18.3 years. 
It might be supposed that the greater maturity of the men could 
account for the differences shown. To investigate this supposition 
the group of almost three thousand students, all of whom had 14% 
to 2 units of high school mathematics, were grouped by ages at last 
birthday. Results are shown in Table III and in Fig. 2. 
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Fic. 2.—Comparisons of mathematics scores of low freshmen with same number of 
high school units according to age of students. (N = 2919) 


Thus when the two factors of high school preparation and age 
are equalized simultaneously, there is a distinct and significant supe- 


riority of men at every age level. On Fig. 2 the lines again tend 
to be straight and parallel. 


Stupy OF RESPONSES TO ITEMS OF THE TEST 


Finally, a study was made of responses of the two sexes to individ- 
ual items of the test. For this purpose papers were taken for the 
largest age group (eighteen years) of the students having 14% to 2 
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TaB_Le III.—Comparisons oF MatTuHematics Scores or Low FRESHMEN WITH 
Same NuMBER oF Hiau Scuooot Units AccorpING To AGE or STUDENTS 
ars. Men Women Difference 
uld Age l 
10n Number! Mean | PE | Number} Mean| PE | Amount, PE 
144 | | 
last 14-16 44 | 32.42) 0.38 | 117 | 27.80 0.53! 4.62 | 0.65 
17 227 | 30.70) 0.49 | 519 25.00; 0.26 | 5.70 0.55 
18 389 28.45) 0.35 | 655 23.15) 0.21 | 5.30 0.41 
19 291 28 . 20) 0.40 | 247 | 20.55, 0.31 | 7.65 0.51 
20 147 25.75) 0.54 | 83 21.04 0.64 | 4.71 0.84 
21 and over.... 144 | 26.35, 0.74 | 56 19.13, 0.55 | | 7.22 | 0.92 
pare anes | | | 
| 1242 | | 1677 | | | 
years of high school work. Table III indicates that there were 1044 
students in this group. To avoid excessive labor, one hundred papers 
were taken at random for the.men and one hundred for the women 
TaBLE IV.—Excess 1n Per CrEnT or Correct RESPONSES TO ITEMS OF MEN 
OVER THOSE OF WOMEN FOR MATHEMATICS SECTION oF Iowa HiaH ScHoou 
ConTENT EXAMINATION, Form B 
a Excess in | | Excess in |, Excess in | —n Excess in 
per cent | | percent | | percent || per cent 
54 37 | 44 | 13 || 3 | 8 || 60 | 3 
56 | 25 | 52 13 | 15 7 || 40 3 
10 23 «|| «38 12 || 32 | 7 || 31 3 
37 | 22 || 12 | 12 || 69 7 || 14 3 
2200 kT 6 18 2 
17 19 | 26 | 11 | «42 6 67 2 
20 19 || 39 | 11 || 80 6 74 2 
of 50 19 || 46 11 || 29 6 | 73 | 2 
59 18 55 11 6] 2 5 || 72 | 2 
= 36 18 47 10 4 5 8 2 
. 27 18 34 10 62 5 11 1 
. 35 | 17 33 100 | 65 5 48 0 
d 13 | 17 || 16 10 70 5 5 0 
2 | 16 # || 45 9 75 5 
58 14 66 8 9 4 68 —1 
24 14 57 8 21 4 49 —1 
19 14 53 8 64 3 41 —l 
4 7 14 | 25 8 63 3 1 —1 
€ 43 | 13 | 6 ° 8 61 3 51 | —8§8 
| 
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for analysis, subject only to the restriction that no papers were taken 
in which a student had failed to attempt any of the last twenty-five 
items. 

The median number of items attempted by the men was fifty-two, 
by the women forty-seven, showing a tendency for the men to work 
more rapidly than the women. 

The per cent of correct responses for men and for women on each 
item was computed, and from this the excess in per cent of correct 
responses of the men over the women was found. In all but five of 
the seventy-five items this difference was in favor of the men. The 
performance of the women exceeded that of the men in only five items, 
but in four of these the differences were too small to be significant, 
being only one per cent. The only item in which there was a signifi- 
cant difference in favor of the women (eight per cent) was one from 
plane geometry, relating to concentric circles. On the other hand, 
the men showed an excess of ten per cent or more in no less than thirty- 
two items. The greatest difference in favor of the men was one of 
thirty-seven per cent, in an item of information on the number of 
inches in a meter. A summary of the excess in per cent in favor 
of the men is shown in Table IV. 


SUMMARY 


An analysis of the scores of over 6000 low freshmen students in 
California junior colleges on the mathematics section of the Iowa 
High School Content Examination, shows significant differences in 
favor of the men when the factors of high school preparation and age 
are equalized, and when a study is made of responses on individual 
test items. 
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REPORT ON AN ATTEMPT AT THE PROGNOSIS OF 
UNUSUALLY GOOD AND UNUSUALLY POOR 
SCHOLASTIC WORK 


LUELLA COLE PRESSEY 


Ohio State University 


I. Nature of the Investigation.—During the first week of the Fall 
Quarter of 1929 at Ohio State University a series of tests were given 
during Freshman Week. These tests included measures in reading, 
English, mathematics, history, language and intelligence. The scores 
made by entering freshmen were listed. A week after the beginning of 
the quarter the writer went through this list and marked in red the 
names of those students who appeared—on the basis of all available 
test scores'—to be excellent ‘“‘educational risks,’’ and in blue the names 
of those students who seemed to be very poor ‘“‘educational risks.” A 
copy of this list was deposited with Dr. W. W. Charters (of the Bureau 
of Educational Research) during the third week of the quarter and 
before the appearance of any scholastic reports. There were one 
hundred sixty-eight students marked as very promising, and two 
hundred thirty-five marked as very unpromising. (Students having 
incomplete test records were not included; over two thousand records 
however, were complete.) 

One complication of the results should be mentioned. During the 
Fall Quarter, sixty-two of the lower group were in a class in “‘remedial 
reading’ where they received extra help. These students were not 
dropped from the list but have been considered separately, since they 
were given remedial treatment. 

At the end of the quarter the grades in all subjects (not counting 
physical education, military and survey courses) were found. There 
were twenty-three students who had withdrawn, or were incomplete, 
or whose grades could not be found. There remained a total of (a) 
one hundred sixty-three in the ‘‘very promising”’ group, (b) one hun- 
dred fifty-six in the ‘‘very unpromising”’ group who had had no special 
training, and (c) sixty-one in this latter group who had been given 
remedial treatment. The remainder of the report will deal with the 
grades obtained by these three groups. 





1 The results of all tests were given in terms of percentiles; those students in 
the ‘‘good” group had an average percentile of eighty-five or above, those in the 
‘‘poor”’ group of fifteen or below. 
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II. Results of the Investigation—The point-hour-ratios obtained 
by these groups are presented below: 











TABLE | 
- Unpromising | Unpromising 
Promising and untrained | and trained 
D = 1.0 and below.............. 1 89* (57%) 26 (42%) 
Re Cie ee be re te 2 | 28 (18%) 7 (11%) 
I 85 BS dain oe ne te 1 14 14 
C = 2.0-2.4 23 15 11 
TCE ee Re ee 27 | 8 3 
EE a ot. ow eh Goes 57 (25%) | 2 
Ss. hn ibid: tare kv 6 ae 33 (20%) | 
i ee DS og be Bak aR ee 19 (12%) | 
EN eee. a | 156 61 
RS ot ee 3.47 | .88 1.32t 





*Of these eighty-nine ratios, ninteen were zeros—that is, a complete ‘“‘E”’ 
record. 

+t These sixty-one students were among the poorest in the reading classes. 
The median ratio for everyone trained in reading was 1.32 (without physical educa- 
tion and military). 


Assuming that anyone with an initial point-hour-ratio 1.4 below is a 
‘poor risk’”’ educationally, it appears that seventy-five per cent of the 
prediction at the lower end—when no assistance was given—was 
correct. Even with training, fifty-three per cent of those predicted 
as failures, actually had ratios below 1.5. It would seem, then, that 
the prediction on the lower end of the distribution was good. 

Assuming that anyone with a ratio of 3.0 or better is an “excellent 
risk’’ educationally, it would seem that sixty-seven per cent of the 
prediction at the upper end was absolutely correct. There was another 
thirty-one per cent of these students who made records between a C 
and a B. In fact, only four cases—or two per cent made ratios 
below 2.0. In other words, one hundred fifty-nine of the one hundred 
sixty-three were ‘“‘good risks,’’ with one hundred nine being ‘‘excellent 
risks. ”’ 

It is interesting that the ‘‘good”’ group made a total of nine hundred 
fifty-four A’s and one hundred B’s as against fifteen A’s and one thirty- 
six B’s for the “poor” group; this latter group made a total of seven 
hundred seventy-five #’s and six hundred thirty D’s as against fifteen 
E’s and seventy-one D’s made by the former group. 
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III. Value of the Investigation—The writer would like to suggest 
two practical procedures on the basis of such results as above shown. 
(1) Every effort should be made to discourage students whose prospects 
are poor, to persuade them to withdraw during the first few weeks, and 
to suggest constructively other lines of work into which they might 
go. These results indicate that such students are spending money to 
no good purpose, are taking up everyone’s time and energy, and are 
headed for disappointment and failure. Even with help, fifty-three 
per cent of those studied made unacceptable records and another 
twenty-three per cent made records which, if not improved consider- 
ably, will soon eliminate them. There is no kindness in allowing them 
to remain. It is probable that better tests and adequate personnel 
work would permit of sorting out those who have some chance of 
success. (2) Those who show promise should be encouraged at once. 
The names of all prospectively good students could be sent to the 
instructors in whose classes they are enrolled. The instructors 
could then reach out to these students, talk to them, find their interests, 
put them in touch with others who can help them, and encourage them 
inevery way. It is entirely feasible, with such data at hand, to make a 
determined drive—beginning about the second week in the quarter— 
toward the development and conservation of talent among the promis- 
ing freshmen. 

It is earnestly recommended that colleges institute such testing 
programs and then make constructive use of the results to forestall the 


despair of failure and to conserve and develop the intellectual resources 
of the next generation. 








BOOK REVIEWS 


New Minds: New Men?, by Thomas Woody. New York: The 
Macmillan Company, 1932. Pp. 528. 


It is difficult to estimate exactly the reasons for the interest in 
the Russian experiment which is so widespread in the United States. 
It may be due to the characteristic yearning for the novel, for the 
satisfaction which comes from a new sensation, or it may be due to 
gratitude that we are not like unto others who are engaged in a struggle 
to shape their lives anew. And yet in the present crisis in which 
American education finds itself the study of educational developments 
abroad cannot but be illuminating. Such a study would be valueless, 
however, if we start with the prejudice with which so many American 
students begin, that, because we spend more money and have larger 
enrollments, the American system is superior to all others. The 
important problem in education is after all not one of numbers but 
what education is all about, what purposes it seeks to achieve, and how 
successfully are they attained. There is much of which this country 
may be proud in education but no one can claim that it has yet found 
itself in terms of an enriched life and culture. 

Dr. Woody was confronted with a difficult task in attempting 
to add another book to the already large volume of literature on the 
Soviet Revolution, but he has been amazingly successful in his under- 
taking. Eminently qualified by a scholarly knowledge of the Russian 
background, by command of the language, and by the opportunity to 
observe the progress of the Revolution at different intervals almost 
from its origin until 1930, he has produced a book which will compare 
favorably with any which have appeared up to the present. The 
reader may not find here a detailed description of the organization of 
Soviet political and economic institutions, but he will be guided 
through the whole process by which Soviet mentality is created from 
the cradle to the grave. Dr. Woody has traced the psychological 
growth of the individual in Soviet society step by step so that the 
reader can follow the method by which the whole structure grows from 
the foundation up to the capstone. This he has done objectively 
and, so far as I am concerned, it is impossible to detect from 
Dr. Woody’s presentation where his own sympathies lie. Not 
that this is important, for the essential contribution of this book 
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and other books on Soviet education lies not in our attitudes to 
the Revolution but in the challenge which it offers to American 
educators. 

Between those who spend their time in decrying the past and those 
who wish to harness the educational chariot to an inscrutable future 
the precious present is slipping and has slipped. If Soviet educational 
developments have anything to contribute to American thought it 
lies in this, that no system of education can hope to meet with success 
unless it is directed to some definite purpose, and that purpose must 
be social. The weakness of American education in the last few 
decades is that educators have talked of socialization and social 
motives without any attempt to give them reality in terms of the 
American scene; practice, of course, has proceeded on the assumption 
that a collection of individuals constitute a society. If one point 
stands out more than any other on the vast canvas which Dr. Woody 
has so clearly painted, it is that in Russia there is no break in gauge 
between school and life, that here one does see the meaning of the 
principle, which falls so glibly and yet so vacuously from the lips of 
American teachers, that education is life. It may at once be admitted 
that Soviet education is directed to propaganda and indoctrination, 
but escape from this principle is not to be found in the cult, so popular 
at present in the United States, of freedom and creative activity. The 
lesson for educators in this country is to discover whether there is 
such a thing as an American culture worth propagating, something 
which ought to become the possession of all instead of the flabby goals, 
aims, purposes and objectives which change with the regularity 
almost of women’s fashions. My own conviction is that there is 
something which can be defined and set up, if only we would give up 
the attempt to prove the respectability of education by showing 
that we can be as scientific as the physicist. Even the physicist is 
discovering that science cannot explain everything. Further there 
has also developed in America a serious confusion between methods 
and philosophy and much of current philosophy is at best methodology. 
Soviet education has at least the merit of keeping them separate. If 
Professor Dewey’s most recent statement of the aim of education 
(think for yourself; learn to act with and for others) be used as a 
criterion, Soviet education is deliberately designed to develop only one- 
half of it because it has a definite social, political, and economic aim 
before it. The real challenge lies in the discovery of a moral equivalent 
for Sovietism—call it Americanism or American culture or anything 





ok 
- 
3 - 
¥ 
t 
‘ 
‘ 


Bb Fad BS FE. ee 


ot! ee IN Fe 
oo ees Ui 


~~ 
> ma 


SS tee, 


S85: 


392 The Journal of Educational Psychology 


else that the teacher can grasp—about which the individual may think 
and for which he will act with and for others. Without this American 
education will be exposed to the winds of theories which last just 
until sufficient air is accumulated for a new blast. 

To Dr. Woody American educators owe a debt of gratitude for 
the clear, precise and detailed description of everything that goes 
to make up education in Russia—and the school is only a small part. 
His New Minds: New Men? will stand on its own merits as a contribu- 
tion to our knowledge of the Soviet régime, but Dr. Woody will have 
earned a double debt of gratitude if only American educators realize 
that there is in his book a challenge which cannot be overlooked. 

I. L. KANDEL. 
Teachers College, Columbia University. 





Society. Its Structure and Changes, by R. M. Maclver. New York: 
Ray Long and Richard R. Smith, Inc., 1931. Pp. 591. 


Dr. Maclver is impatient with “those sociologists who are content 
to point to the guide-posts, who never feel happy when they pass 
beyond their figures to the social truths they yield, who delight in 
the ‘natural science’ approach, who think their work is done when they 
have counted something and measured something else.’”’ To him, 
science is not to be limited altogether to “the arid schematism of 
figures and tables and classifications,’’ and we must find ‘“‘a clearer 
illumination” of the working of society perhaps in the more ‘‘frag- 
mentary revelations of the social novelist, dramatist and essayist.” 

One can appreciate and sympathize with this unmistakable 
annoyance with the schools of sociologists who have plagued the 
intellectual scene in this country until now, for they have in the large 
been abject imitators and sycophants at the great shrine of ‘‘natural 
science.”’ The latest form which this has taken has been playing 
around with mountains of statistics, graphs, trend curves, etc. Dr. 
Maclver is courageous if nothing else; he avers further that his 
analysis of social phenomena is to be founded upon internal conscious 
experience rather than upon ‘“data.”’ To this extent the literary 
artist takes up the cudgels for sociology again. The “mechanistic 
simplicism”’ of social scientists on the other hand, ‘“‘will prove to be 
nothing more than the bad dream of an engineer.” 
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The introductory section of this text, entitled Initiation, offers 
definitions of primary social concepts such as community, association, 
institution, mores, and gives a primary analysis of the social relation- 
ship. The following section deals with social structure, its organiza- 
tion and functions and its regulative principles. The social order is 
then examined as it is related to and dependent upon environment. 
Finally, the processes of social change are evaluated in the interpreta- 
tion of trends and the significance of evolution. 

It is refreshing to see a first-rate social thinker discard false alle- 
giances and sterile imitativeness, but on the other hand it is difficult to 
see how a reputable body of knowledge and doctrinal analysis is to be 
built up on mere verbalism and footloose reasoning. One demands 
some sort of ‘‘system”’ as a substitute for minute statistical studies and 
it is doubtful whether Prof. MaclIver has supplied us with this. This 
rather inductive type of social science is apt to lead to another flood 
of personal sociologies from which the discipline suffered heretofore. 
As a pen-picture of society the book is useful and informative but its 
coherence is not any too obvious. NATHAN MILLER. 

Carnegie Institute of Technology. 





The Psychology of Children’s Drawings, by Helga Eng. New York: 
Harcourt, Brace & Co., 1931. Pp. VIII + 223. 


“The Psychology of Children’s Drawings” is interesting mainly 
because of the subject material. A large part of the book is devoted 
to a genetic study of the drawings made by the author’s niece from 
the age of ten months, when she made her first stroke, to the age of 
eight years when she was concerned with caricature and color. The 
large number of reproductions appearing in the book indicate graph- 
ically the development of this ability in what appears to be a fairly 
representative child. 

Having presented a report on a single case, Miss Eng continues 
with a rather elementary discussion of certain psychological factors 
involved in children’s drawings, offering material which can be 
skimmed in a few minutes. In the last chapter she toys with the 
badly battered recapitulation theory, and, after a few commendable 
reservations, finally concludes that “the comparison between children’s 
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drawing and palaeolithic thus confirms the biogenetic law that the 
development of the species is reproduced in the child.”’ 
HERBERT A. CARROLL. 
University of Minnesota. 





Readings in Industrial Psychology, by Bruce V. Moore and George W. 
Hartmann. New York: D. Appleton and Co., 1931. Pp. 
XL + 560. 


This splendid volume is a forceful reminder of the tremendous 
amount of progress that has been made in the field of applied psy- 
chology within the past decade. It revealsin unmistakable fashion how 
the science of psychology is replacing outworn superstitions and blind 
trial and error as instruments for the solution of industrial problems. 

It is a collection of readings, drawn from innumerable sources, 
and woven into what is on the whole an unusually orderly and system- 
atic outline, considering the difficulties inherent in such a compilation. 
In fact, so well arranged and classified are the selections that the 
volume may readily be used as a text for courses in applied psychology. 

The eighteen chapters are as follows: Introduction, Basic Principles, 
Popular vs. Scientific Procedures in Appraising Men, Technique of 
Personnel Selection, Rating Scales, Mental Tests and Individual 
Placement, Analysis of Occupational Interests, Vocational Guidance, 
Training the Worker, Efficiency and Scientific Management, Fatigue 
and Rest Periods, The Working Environment, Accidents, Monotony, 
Morale, Labor Unrest and Strikes, Leadership and Social Adjustment, 
Distributing the Product. Each chapter is made up of from ten to 
twenty excerpts from individual articles and books. 

The high quality of material may be illustrated by the names of the 
authors quoted. In the first chapter, for instance, the compilers use 
selections from W. V. Bingham, Hugo Munsterberg, J. 8. Dashiell, 
Madison Bentley, James Drever, J. McKeen Cattell, and Samuel 
Gompers. 

In the conquest of any new field of knowledge, such as that of 
applied psychology, there is a definite place for such a compilation. A 
unified textbook serves one distinct and useful purpose, while a book of 
readings is equally essential in another line. The authors of this 
excellently conceived and well-executed volume merit the gratitude of 
all teachers and students of applied psychology. 

C. C. CRAWFORD. 


University of Southern California. 
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Psychopathic Personalities, by Eugen Kahn. New Haven: Yale 
University Press, 1931. Pp. XII + 521. 


Beginning with a clinical or descriptive classification of psycho 
pathic types and discussing the recent trends toward psychological 
and psychobiological explanations the author develops a systematic 
ypschology of personality in general, and of the psychopathic 
personality in particular. Personality for the author is a stratified, but 
integrated, combination of impulses, temperament, and character. 
‘Temperament is the transformer for expression of the physical in 
general, and of the impulses in particular, upward to the level we call 
character.”’ ‘Vice versa temperament is the transformer from the 
upper level, of the impulses and the physical.” 

“It is impossible to draw boundaries between the normal and the 
psychopathic. There are only fluid transitions. Characteristics, 
manifestations, and reaction patterns, employed in the conception of 
the psychopathic, obey biologically and psychologically the same 
laws as the analogous manifestations in the normal.” ‘‘The psycho- 
pathic then is differentiated not qualitatively but only quantitatively 
from the normal.” ‘By psychopathic personalities we understand 
those personalities which are characterized by quantitative peculiarities 
in the impulse, temperament, or character strata. The degree of the 
peculiarities is relative. It is dependent upon the totality of the 
individual personality. This definition avoids every unnecessary 
limitation and makes possible the connection with the genetic and 
clinical together with the teleological point of view.”’ ‘The evaluation 
of a personality from the outside, the objective valuation, is concerned 
with the realm of aim and purpose, that is, with character.”’ ‘‘With- 
out evaluations a teleological consideration of the personality is 
inconceivable.’’ ‘By psychopathic personalities we understand those 
discordant personalities which on the causal side are characterized by 
quantitative peculiarities in the impulse, temperament, or character 
strata and in their unified goal-striving activity are impaired by 
quantitative deviations in the ego- and foreign (social) valuation.’’ 

This discussion of the concept of personality in general, and of the 
concept of psychopathic personality in particular is followed by a 
systematic discussion and analysis of different psychopathic types 
based on extensive clinical experience. 

The student of psychology will find this book enlightening in its 
discussion of the problem of personality, whether normal or psycho- 
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pathic, and fair-minded in its evaluation of the work of others in this 
field, where there have been many one-sided and exaggerated points 
of view through attempts at oversimplification. To the psychiatrist 
and psychologist working in the field of mental hygiene the book 
presents a well systematized basis for understanding the clinical 


problems of the psychopath. Francis N. MAXFIELD. 
Ohio State University. 
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